GA Based Q-Attack on Community Detection

GA Based Q-Attack on Community Detection

Jinyin Chen, Lihong Chen, Yixian Chen, Minghao Zhao, Shanqing Yu, Qi Xuan, , Xiaoniu Yang J. Chen, L. Chen, Y. Chen, M. Zhao, S. Yu, and Q. Xuan are with the Institue of Cyberspace Security and the College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China (e-mail: chenjinyin@zjut.edu.cn; coffeeclh@163.com; yichixchen@163.com; yzbyzmh1314@163.com; yushanqing@zjut.edu.cn; xuanqi@zjut.edu.cn).X. Yang is with the Institue of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China, and with the Science and Technology on Communication Information Security Control Laboratory, Jiaxing 314033, China (e-mail: yxn2117@126.com).This article has been submitted on October 21th, 2018.Corresponding author: Qi Xuan.
Abstract

Community detection plays an important role in social networks, since it can help to naturally divide the network into smaller parts so as to simplify network analysis. However, on the other hand, it arises the concern that individual information may be over-mined, and the concept community deception thus is proposed to protect individual privacy on social networks. Here, we introduce and formalize the problem of community detection attack and develop efficient strategies to attack community detection algorithms by rewiring a small number of connections, leading to individual privacy protection. In particular, we first give two heuristic attack strategies, i.e., Community Detection Attack (CDA) and Degree Based Attack (DBA), as baselines, utilizing the information of detected community structure and node degree, respectively. And then we propose a Genetic Algorithm (GA) based Q-Attack, where the modularity is used to design the fitness function. We launch community detection attack based on the above three strategies against three modularity based community detection algorithms on two social networks. By comparison, our Q-Attack method achieves much better attack effects than CDA and DBA, in terms of the larger reduction of both modularity and Normalized Mutual Information (NMI). Besides, we find that the adversarial networks obtained by Q-Attack on a specific community detection algorithm can be still effective on others, no matter whether they are modularity based or not, indicating its strong transferability.

community detection, community detection attack, genetic algorithm, social network, privacy protection, modularity, adversarial network.

I Introduction

Complex networks can well represent various complex systems in our daily life, such as social networks [1, 2], biological networks [3, 4], power networks [5] and financial networks [6]. Network science has been a very active field because of its highly interdisciplinary nature [7]. One hot topic in this field is community detection. Since the concept of community structure was proposed [8], the study of community detection algorithms has attracted great attention from various disciplines [9, 10, 11]. Detecting communities in the network has played an important role in getting a deep understanding of its organizations and functions.

However, with the rapid development of community detection algorithms in recent years [12], a new challenging problem arises: information over-mined. People realize that some information, even their own privacy, is going to be over-mined by those social network analysis tools. The web we browse, the people we contact with and more things like that are all recorded through the Internet. Our interests, circle of friends, partners on business can be easily detected once these data are utilized [13, 14]. Community hiding or deception [15, 16, 17] is put forward to hide certain community. Still, from individual view and global view, community detection attack is launched on the network to change some of its structure thereby making the performance of related algorithms worse and the accuracy of results lower.

The problem of making communities in the network more difficult to detect through attack strategies against detection algorithm is worthy studying because the good attack strategies can be a guideline for individuals or organizations to disguise their social circles and change the way they interact with others, lest some privacies get leaked [16]. In this paper we focus on investigating the attack strategy and testing the effectiveness of these strategies through experiments. In particular, we propose two heuristic attack strategies to address the problem shown in the last paragraph. Heuristic strategy is vital because it is easy to implement and plays well sometimes. During the design process of heuristic attack strategies, we take community detection algorithms and some properties of complex networks into consideration. One question is about what kind of nodes to be chosen as targets may benefit the attack effect. Typically, node centrality is a common-used index to measure the importance of nodes in the network. It mainly includes degree, betweenness and closeness. Compared with betweenness, whose calculation desires global information, degree is a quantity only based on local information with less computation [18]. Thus, we propose an attack strategy based on degree and take some tests.

Furthermore, we consider the design of attack strategy as an optimization problem. Owing to the good performance in solving complex optimization problems, evolutionary algorithms, such as Genetic Algorithm (GA), have been widely used in various disciplines, including complex networks [19, 20, 21]. Searching for attack schemes which make the performance of community detection algorithms decrease most under given attack cost is a typical combinatorial optimization problem. We propose an attack strategy based on GA, namely Q-Attack, where the modularity is used to design the fitness function, and show its effectiveness through experiments on different community detection algorithms and several real-world networks. The major contributions of our work are summarized as:

  • First, we introduce and formalize community detection attack problem, which is to implement global community deception and privacy protection;

  • Second, we propose two heuristic attack strategies and GA based Q-Attack method to launch attack with negligible rewiring to cheat community detection;

  • Third, we conduct comprehensive experiments to compare the attack effects of different attack strategies against three community detection algorithms on several real-world networks;

  • Finally, the transferability of Q-Attack is validated and we think that community detection attack can also be a robust evaluation metric for anti-attack capacity comparison of community detection algorithms.

The rest of paper is organized as follows. Sec. II reviews the related work on community detection and attacks on complex network. Sec. III presents our attack strategies for destroying community structure in the network, including heuristics and optimization methods. We demonstrate the experiments and results in Sec. IV and finally draw conclusions in Sec. V.

Ii Related work

Ii-a Community Detection Algorithms

A large number of community detection algorithms have been proposed since the problem brought up [12]. And detection algorithms continue to spring up because of the diversity of networks in our real world. Here we mainly give a brief introduction of some ones.

Girvan and Newman [22] first proposed a community detection algorithm based on the betweenness of edges and experiments showed that it did a good job in small networks. Some researchers made changes on this study and proposed new ones, e.g.,Tyler et al. [23] got approximate betweenness using the fast algorithm of Brandes, Radicchi et al. [24] used clustering coefficient of edges instead so that only local information was required and the efficiency of the algorithm was improved. Besides, Newman [25] also proposed a greedy algorithm which mainly aimed at optimizing modularity. CNM was an improvement of Newman’s method, which was proposed by Clauset et al. [26]. The main technology used in this algorithm was the data structures that called stacks. Futhermore, Blondel et al. [27] divided the modularity optimization problem into two levels and the computational complexity was essential linearly. Spectral clustering algorithms [28, 29] leveraged the eigenvalue spectrum of several graph matrices to detect the community structure in networks.

Except these methods based on modularity optimization, researchers also considered the problem of community detection from other perspectives. Label Propagation Algorithm (LPA) was proposed by Raghavan et al. [30], whose main idea was updating the label of each node according to its neighbors. Rosvall et al. [31] proposed an algorithm called Infomap, which was based on information theory. This algorithm found out the partition for the network through optimizing the average length of description . Further, Ronhovde and Nussinov [32] proposed a community detection algorithm based on the minimization of the Hamiltonian of Potts model.

Ii-B Attacks on Complex Networks

Several earlier researches have studied the influence of attacks on the network performance. Holme et al. [18] proposed four strategies to investigate the attack vulnerability of different kinds of networks, including ID removal, IB removal, RD removal and RB removal, both for nodes and links. ID removal represented removing nodes (or links) according to the descending order of degrees in the initial network while “B” represented betweenness and “R” represented recalculation. In this study, it was validated that the network structure changed and some performances degraded via attacks. Bellingeri et al. [33] found that node deletion according to betweenness centrality was most efficient while using the size of Largest Connected Component (LCC) as a measure of network damage. As for the community structure, Karrer et al. [34] studied the robustness of community structure in networks through perturbing networks in a specific way. They constructed a random network with the same number of nodes and links as the original one, then replaced the links in the original network with that in the random one with the probability of . These studies shed lights on the further work about taking attacks on networks.

Taking attacks on some network algorithms is an emerging topic in recent years. Yu et al. [35] developed both heuristic and evolutionary methods to add subtle perturbations to original networks, so that some sensitive links would be harder to predict. Dai et al. [36] proposed three methods to modify graph structure, including RL-S2V, GradArgmax and GeneticAlg, aiming at fooling the classifiers based on GNN models. Moreover, Zügner et al. [37] proposed the first adversarial attack on networks using GCN, namely NETTACK, and found that this method is effective on node classification task. Quite recently, Bojchevski and Günnemann [38], as well as Chen et al. [39], proposed adversarial attacks on network embedding at the same time, which may promote the extensive discussion on the robustness of network algorithms, since network embedding establishes a bridge between network space and Euclidean space and thus facilitate many downstream algorithms, such as node classification and link prediction.

Ii-C Researches on Anti-Detection of Community

As for privacy concerns aroused by community detection, some researchers started paying attention to this problem during the last two years. Waniek et al. [16] came up with two heuristic algorithms ROAM and DICE for hiding individuals and communities, respectively. DICE implemented through disconnecting intra-links of given community and connecting inter-links under budget and a measure was proposed to evaluate the concealment. Finoda et al. [17] proposed the concept of community deception to cope with community detection. Safeness-based deception algorithm and modularity-based deception algorithm were mainly tested through amounts of experiments. Another job they did was defining the deception score to evaluate effects of proposed methods. Similar to the measure proposed in [16], deception score also took community spread and hiding into account and it was more comprehensive while reachability was also considered. Extensive distribution of members in and less percentages of for each showed a good hiding, where was the community structure detected by specific community detection algorithm and was one element in .

Figure 1: The framework of community detection attack. We get the attack scheme combined with community detection algorithms. Two metrics and NMI are used to evaluate the attack effect.

There is still little attention that has been paid on the problem of taking attacks on the network to make it harder for algorithms to detect the community structure. And both Waniek’s and Finoda’s researches are dedicated to hide only a small set of nodes (community in the broad sense) targetly. In this paper, we address the problem of information over-mined in community detection from another perspective. Since it is because of the improvement of the community detection algorithms both in detection results and speed that privacy concerns occur, we aim at degrading the performance of algorithms directly. Our goal is to propose some attack strategies to make the quantification of the communities detected in networks decrease as much as possible, which means a better effectiveness for attacks with subtle network changes.

Iii Methods

First, we briefly demonstrate the major problem we focus on in this paper, using related notations and definitions. Let be the original network, where is the set of nodes and is the set of links. is the solution that comprised of a series of links, “+” represents adding the link to the network while “-” represents removing. Then, we have the adversarial network defined as follows:

(1)

We thus want to find out such to change some connections in and get the adversarial network . And for these s, community detection algorithms perform significantly worse, i.e., the quality of detection results decreases.

Fig. 1 gives the overview of our work. In the following, we give the details of attack strategies for community detection algorithms proposed in this paper, including heuristics and evolutionary ones.

Iii-a Heuristic Attack Strategy

We have introduced that we take link attacks on the network to change some relations between nodes. In this paper, we choose rewiring attack to maintain the degree of target nodes, i.e., adding a link to the target node while deleting one from it. This reduces the number of nodes whose degree changes by 2 under one attack, less than 4 for deleting a link between two nodes and adding a link between another two. In social networks, one rewiring attack means getting a false friend while hiding a real one for the person chosen as the target node, the total number of friends of this person keeps unchanged, such attack way can maintain the person’s status in the entire network, i.e., rewiring shows a good concealment of taking attacks on the network.

While thinking of attack strategies, we first come up with a Random Attack (RA) strategy, described in Algorithm 1, for comparison. Here, we randomly select nodes from to form a target node set . We then randomly select a node in each iteration and implement the following operation: deleting an existent link while adding nonexistent one for the target node. The neighbor set of the target node is denoted by , while its non-neighbor set is denoted by . The number of rewiring attacks (or the number of iterations), denoted by , is another parameter representing the attack cost. The procedure of random attack strategy is shown in following pseudo-code.

Input: ,,
Output:
1 ;
2 ;
3 for  do
4       ;
5       ;
6       if  and  then
7             ;
8             ;
9             remove from and add to ;
10             Update ;
11             ;
12            
13       end if
14      
15 end for
Get the adversarial network under rewiring attacks.
Algorithm 1 Random Attack (RA)

A network with community structure often shows the characteristic that having a high density of intra-community links and a lower density of inter-community links. Deleting links within community and adding links between communities thus can weaken the community structure in the network. Therefore, having some knowledge of the community structure in advance can help to design more effective attack strategies. Based on this, we propose a heuristic attack strategy combined with the community division results via specific community detection algorithm, called Community Detection Attack (CDA), the procedure of which is shown in Algorithm 2.

Input: , ,
Output:
1 ;
2 ;
3 ;
4 for  do
5       ;
6       ;
7       if  and  then
8             ;
9             ;
10             remove from and add to ;
11             Update ;
12             ;
13            
14       end if
15      
16 end for
Get the adversarial network under rewiring attacks.
Algorithm 2 Community Detection Attack (CDA)

For target node , we define its intra-community neighbor set as , where is the node set of the community that belongs to. Similarly, we can get its inter-community non-neighbor set , with .

Fig. 2 illustrates how the CDA strategy works graphically. For the target node, the removal of intra-community links weakens its connection with the original organization and the addition of inter-community link generates a force to drag it to the others.

Figure 2: represent the communities detected by a specific community detection algorithm. Node 6 is chosen as the target node according to CDA. The line with a red “” represents the deleted link and the dashed line represents the added link.

It was found that many real-world networks followed power-law degree distribution [40], consistent with the classic 80/20 rule. That means, usually, a small number of nodes possess a large number of connections, while the rest only have a few in real-world networks. Some changes of these hub nodes of larger degree usually have a bigger impact on the whole network structure. For instance, statistically, people with more friends in social networks always plays more important roles in the circle and their absence seems to make the party less active. Therefore, we propose another heuristic attack strategy, namely Degree Based Attack (DBA), aiming to attack the nodes of large degree in the network.

Figure 3: The overview of one iteration of GA based Q-Attack Strategy.
1 procedure NodeChoose;
2 for  do
3       ;
4       add node to ;
5       remove from , where ;
6       if  then
7             update every turn;
8             repeat from step ;
9            
10       end if
11      
12 end for
Algorithm 3 Target Nodes Chosen Based on Degree

The only difference between DBA and CDA is that, in DBA we choose the nodes of large degree as our target nodes, while in CDA we do it randomly. Thus, next we just explain how to choose the target nodes in DBA based on node degree, since the rest attack part is totally the same as CDA. The procedure to choose the target nodes based on their degree in DBA is shown in Algorithm 3. The function BiggestDegree gets the node with biggest degree from the division , whose initial state . And the step update every turn means removing the nodes that have been chosen and one turn is finished when we choose nodes from communities, with one in each community.

Iii-B GA Based Q-Attack

Evolutionary algorithms are a type of intelligent optimization technologies designed by simulating natural phenomena or leveraging various mechanisms of living organisms in nature [41, 42, 43]. GA is a typical one of such algorithms. Its basic model was firstly introduced and investigated by Holland in 1975 [41], which mainly leveraged the natural selection mechanism to solve optimization problems. A population of chromosomes, which represent individuals with different characteristics, initialize at the beginning of the algorithm and then generate offspring according to the designed genetic operators, such as crossover and mutation. Individuals with better fitness have more chances to survive and multiply so that the population are able to evolve. Here we propose an attack strategy based on GA to search effective ¡°rewiring¡± attack schemes. Fig. 3 gives an overview of the evolutionary process.

Before the implementation of GA, we first give our encoding method and fitness function.

  • Encoding: Same as the attack strategies we propose in Sec. III-A, here we choose a ¡°rewiring¡± attack as a gene, including a deleted link and an added link. The length of each chromosome represents the number of attacks.

  • Fitness function: Modularity is an important evaluation to assess the quality of a particular partition for the network and it has been widely applied in the field of community detection (more details in Sec. IV-C). Here we design our fitness function as follows:

    (2)

    indicating that the populations of lower modularity will have larger fitness. And this is also the reason why we name this attack strategy as Q-Attack.

Now, we are going to find out individuals that can make the partition of an adversarial network under specific community detection algorithms have lower modularity. In the following, we will give a specific illustration about how GA operates in community detection attack.

  • Initialization. An initial population is randomly generated at a fixed size in GA. One thing we need to take care is avoiding link deletion or addition conflict, which means deleting or adding a link repeatedly. Each individual in the population is a combination of rewiring attacks, representing a solution of attacks on the network.

  • Selection. Selection is a manifestation of the survival of the fittest mechanism. According to the laws of nature’s evolution, the individuals of higher fitness have a greater chance to survive and multiply than those of lower fitness. We use a common selection operator, namely roulette wheel. In this selection way, the probability for an individual to mating is proportional to its fitness, as represented by Eq. (3):

    (3)
  • Crossover. Here, we adopt single point crossover which is the easiest to achieve in GA, i.e., we generate a cut point at random and change the gene segments between parents. The probability for paired individuals to generate offspring through crossover operation is denoted by . In order to avoid the conflicts on genes (as mentioned before, two identical deleted-links or added-links are not allowed to appear on one chromosome), we do a feasibility test before two individuals get crossed.

  • Mutation. To prevent the solution from falling into a local optimum, mutation operator is a necessary component in GA. Considering the network characteristics, here, we propose three kinds of mutation operators: link-deletion mutation, link-addition mutation and link-reconnection mutation. link-deletion or link-addition mutation indicates that the deleted or added link of the gene changes independently, while link-reconnection mutation indicates that a gene is completely replaced. These three mutations occur with equal probability and the overall mutation rate is denoted by . Such designs on mutation operators can promote the diversity of population. Conflicts detection is also considered while the genes are mutating.

  • Elitism. Over the course of evolution, individuals with high fitness in the parent may be destroyed. Elitism strategy has been proposed to solve this problem. In this paper, we replace the worst 10% of the offspring with the best 10% of the parents to maintain excellent genes.

  • Termination criteria. We set a fixed number of evolutionary generation and the algorithm stops when this condition is fulfilled.

Input: ,,,,,
Output:
1 ;
2 while  do
3       ;
4       ;
5       ;
6       ;
7       ;
8       ;
9      
10 end while
Algorithm 4 Q-Attack Based on GA

In summary, GA based Q-Attack mainly involves the encoding, fitness function and the design of genetic operators. The pseudo-code of this attack strategy is shown in Algorithm 4.

Iv Experiments

Iv-a Datasets

In order to evaluate the attack effect of our attack strategies, we test them on two real-world datasets, including Zachary’s karate club network [44] and bottlenose dolphin network [45], which are commonly used for community detection with ground-truth partitions.

  • Zachary’s karate club network: It is a dataset about the relationship in a karate club recorded by Zachary. The network consists of 34 nodes and 78 links, which is ultimately divided into two groups out of the conflicts between the instructor and the administrator.

  • Bottlenose dolphin network: This network is created by Lusseau, demonstrating the associations of the dolphins living off Doubtful Sound, New Zealand. The network consists of 62 nodes and 159 links. In reality, these dolphins are divided into two groups.

Iv-B Community Detection Algorithms

Here, we introduce three well-known community detection algorithms that are attacked by our strategies proposed in Sec. III. Note that these algorithms are mainly used for detecting non-overlapping community structures.

  • Fast Newman Algorithm (FN) [25]: It is a hierarchical agglomerative algorithm. Each node is considered as a community at the beginning and we merge communities in pairs repeatedly with the aim of optimizing , until all the nodes are merged into one community.

  • Spectral Optimization Algorithm (SOA) [28, 29]: Through analyzing the spectral properties of networks, Newman gives a reformulation of modularity:

    (4)

    where is the number of links, is the element of adjacency matrix , is a column vector with each element representing the community label of node , and are the degrees of nodes and , respectively. is called modularity matrix. The communities that nodes belong to depend on the signs of the elements in the leading eigenvector of the modularity matrix.

  • Louvain Algorithm (LOU) [27]: The main idea for this algorithm is considering the community detection as a multi-level modularity optimization problem. First, it starts with isolated nodes and repeats the process of removing the node to the community which obtains maximum gain of modularity until no further improvement can be achieved. Second, it merges the community detected by first step into a super-node and constructs a new network. These two steps are performed repeatedly until the algorithm is stable.

Iv-C Metrics for Community Structure

  • Modularity: It is widely used to measure the quality of divisions of a network, especially for the networks with unknown community structure, which was firstly proposed by Newman and Grivan [46] and is defined by

    (5)

    where represents the fraction of links in the network with two terminals both in cluster , represents the fraction of links that connect to the nodes in cluster . In other words, modularity measures the difference between actual fraction of within-community links and the expected value of the same quantity with random connections. A reformulation of the modularity is represented by Eq. (4).

  • Normalized Mutual Information (NMI): It is another commonly used criterion to assess the quality of clustering results in analyzing network community structure, which was proposed by Danon et al. [47]. For two partitions and , the mutual information is defined as the relative entropy between the joint distribution and the production distribution :

    (6)

    A noticeable problem for mutual information alone being a similarity measure is that subpartitions derived from by splitting some of its clusters into small ones would have same mutual information with . NMI thus is proposed to deal with this problem, which is defined by

    (7)

    The value of NMI indicates the similarity between two partitions, i.e., the larger value means more similar between the two. When NMI equals 1, partition is identical to partition .

Figure 4: The curves of best fitness for various values of parameters and when Q-Attack is applied to FN algorithm on Dolphins network.

Iv-D Experimental Results

Our experiments are performed on a PC i5 CPU 2.60GHz and 4GBs RAM. Our programming environment is python 2.7. There are four parameters in GA based Q-Attack. We set the population size and the evolutionary generation uniformly. Then, we tune the crossover rate and the mutation rate from four different values (0.5, 0.6, 0.7, 0.8 for and 0.04, 0.06, 0.08, 0.1 for ), respectively, according to the results for parameter configuration. Fig. 4 shows the curves of best fitness for a set of experiments when Q-Attack is applied to FN algorithm on Dolphins network, and we get the optimal parameters and for this case, under , which is a medium value. Table I gives all the optimal parameters we use for Q-Attack in different situations.

FN+Kar FN+Dol SOA+Kar SOA+Dol LOU+Kar LOU+Dol
0.8 0.6 0.7 0.6 0.7 0.8
0.1 0.06 0.1 0.04 0.1 0.1
Table I: Optimal parameters for Q-Attack in different cases
Figure 5: The attack effects of the four attack strategies on three community detection algorithms and Karate network, for various numbers of attacks.
Figure 6: The attack effects of the four attack strategies on three community detection algorithms and Dolphins network, for various numbers of attacks.

We test our attack strategies under different attack number . In Sec. IV-C we give a brief introduction of two metrics, modularity and NMI. Modularity evaluates the results from the perspective of significance of community structure and NMI evaluates the results from the perspective of accuracy. These two are most common metrics in related researches. Here, we also choose these two metrics to evaluate the effectiveness of attack strategies. The results are shown in Figs. 5 and 6. For heuristic attack strategies proposed in Sec. III-A, we take the number of target nodes (15% nodes) for Karate network and (10% nodes) for Dolphins network. Corresponding to each specified , we generate 50 adversarial networks for each strategy and record the mean values of and NMI, respectively. For our GA based Q-Attack strategy, we set parameters according to Table I. With changing from 2 to 8, each point on the curve is the mean value over 10 runnings. For , crossover operator described in GA is hard to implement. Here are two solutions for this problem: 1) we can perform an internal crossover for one rewiring attack, i.e. swapping out the deleted link or the added one; 2) we can get the best result by going through all possible situations. Here, we get the results according to the second solution, since the two networks we use are both small and thus it only takes little time to get the results.

From Figs. 5 and 6, we can see that, in general, for almost all attack strategies, the values of and NMI decrease as the budget increases, i.e., the attack effect is more significant when more links can be changed. As expected, Q-Attack behaves best to reduce the value of . This is not surprising since its goal is designed to minimize , based on Eq. (2). Moreover, for the other metric NMI, we can still find that the curves for Q-Attack present sharper decline than the others, indicating its best attack effect on both Karate and Dolphins networks. Such results validate that our GA based Q-Attack is indeed more effective in both weakening the strength of community structure and reducing the similarity between the community detection results and the true labels. Besides, we also notice that, in most situations, DBA does a better job than CDA in reducing while we get the opposite results on the metric NMI. This seems reasonable by considering that nodes of large degree plays a vital role in maintaining the structure of network and thus attacks on such nodes may affect the network structure more significantly. However, on the other hand, such large-degree nodes always stay in the centers of communities, and thus are relatively difficult to be grouped into wrong communities when they are under attack, which may lead to the less reduction of NMI for DBA than CDA.

Strategy FN+Kar FN+Dol SOA+Kar SOA+Dol LOU+Kar LOU+Dol
NMI NMI NMI NMI NMI NMI
Original 0.381 0.692 0.495 0.573 0.371 1.00 0.390 ¡¡0.753 0.419 0.587 0.519 0.511
RA 0.24% 19.14% 0.09% 12.52% 5.62% 8.16% 3.81% 11.84% 4.43% 3.79% 2.15% 11.71%
CDA 0.53% 24.23% 2.88% 18.14% 8.64% 19.84% 5.10% 22.94% 6.56% 8.14% 4.71% 11.45%
DBA 4.46% 8.71% 6.49% 15.98% 13.24% 16.84% 12.51% 13.41% 10.75% 2.65% 5.53% 9.48%
Q-Attack 23.64% 41.06% 26.04% 45.27% 37.70% 53.81% 44.53% 62.83% 26.92% 41.30% 24.38% 35.24%
Table II: Attack results when the attack number is set to 5% of the total number of links in the networks. Here, we only present the relative reduction of and NMI, compared with their values in the original network without attack.
Figure 7: (a) Communities detected by Louvin algorithm on Karate network.(b) Communities detected by Louvin algorithm on the adversarial network. Different colors of nodes represent the different communities they belong to, while different shapes represent the different communities these nodes belong to according to the given label.

To provide more quantitative comparison, we present the relative reduction of and NMI under the four attack strategies on the two networks in Table II, where for each column, we mark the top two best results as bold. The attack number is set to 5% of the total number of links in the network, i.e. for Karate and for Dolphins. Again, we can see that our GA based Q-Attack has significantly better attack effects, in terms of larger relative reduction of both and NMI, than the three heuristic strategies in all the cases, i.e., for any pair of community detection algorithm and network. DBA has better attack effects than CDA in all the cases when using modularity as the performance metric, while the results are reversed for NMI. And as expected, Q-Attack, DBA and CDA behave much better than the random attack strategy.

In Fig.7, we visualize an example that Q-Attack is applied on the Louvin algorithm and Karate network. The community structure in the original network detected by Louvin algorithm is shown in Fig. 7 (a). The modularity and NMI are 0.4188 and 0.5866, respectively. We also use Louvin algorithm to detect the community structure in the adversarial network obtained under 4 rewiring attacks, which is shown in Fig. 7 (b). After attack, the modularity falls to 0.3053 while NMI falls to 0.3078 and the number of communities changes from 4 to 5. We can see that, indeed, the community detection result presents higher-level disorder after attack, indicating the good attack effect of Q-Attack.

Dataset Metric Network FN SOA LOU INF LPA
Karate Original 0.381 0.371 0.419 0.402 0.380
RA 0.380(0.24%) 0.351(5.62%) 0.400(4.43%) 0.387(3.64%) 0.312(18.13%)
CDA 0.379(0.53%) 0.339(8.64%) 0.391(6.56%) 0.350(12.93%) 0.290(23.89%)
DBA 0.364(4.46%) 0.322(13.24%) 0.374(10.75%) 0.338(15.93%) 0.280(26.48%)
Q-Attack(FN) 0.291(23.64%) 0.320(13.84%) 0.376(10.29%) 0.318(20.88%) 0.240(36.91%)
Q-Attack(SOA) 0.377(1.00%) 0.231(37.70%) 0.400(4.52%) 0.344(14.49%) 0.274(27.98%)
Q-Attack(LOU) 0.341(10.29%) 0.324(12.72%) 0.306(26.92%) 0.263(34.67%) 0.260(31.57%)
NMI Original 0.692 1.000 0.587 0.699 0.806
RA 0.560(19.14%) 0.918(8.16%) 0.564(3.79%) 0.634(9.43%) 0.583(27.58%)
CDA 0.525(24.23%) 0.802(19.84%) 0.539(8.14%) 0.533(23.76%) 0.528(34.45%)
DBA 0.632(8.71%) 0.832(16.84%) 0.571(2.65%) 0.648(7.33%) 0.539(33.12%)
Q-Attack(FN) 0.408(41.06%) 0.951(4.89%) 0.560(4.48%) 0.580(17.12%) 0.484(39.91%)
Q-Attack(SOA) 0.572(17.47%) 0.462(53.81%) 0.606(-3.37%) 0.556(20.45%) 0.449(44.31%)
Q-Attack(LOU) 0.500(27.81%) 0.902(9.77%) 0.344(41.30%) 0.509(27.24%) 0.582(27.71%)
Dolphins Original 0.495 0.390 0.519 0.524 0.506
RA 0.495(0.09%) 0.375(3.81%) 0.507(2.15%) 0.508(3.19%) 0.456(9.83%)
CDA 0.481(2.88%) 0.370(5.10%) 0.494(4.71%) 0.489(6.76%) 0.416(17.74%)
DBA 0.463(6.49%) 0.341(12.51%) 0.490(5.53%) 0.479(8.72%) 0.413(18.26%)
Q-Attack(FN) 0.366(26.04%) 0.349(10.55%) 0.481(7.32%) 0.485(7.48%) 0.379(25.17%)
Q-Attack(SOA) 0.476(3.92%) 0.217(44.43%) 0.492(5.12%) 0.492(6.25%) 0.432(14.54%)
Q-Attack(LOU) 0.455(8.08%) 0 .348(10.85%) 0.389(24.96%) 0.471(10.24%) 0.406(19.82%)
NMI Original 0.573 0.753 0.511 0.553 0.607
RA 0.501(12.52%) 0.664(11.84%) 0.451(11.71%) 0.479(13.47%) 0.575(5.28%)
CDA 0.469(18.14%) 0.580(22.94%) 0.452(11.45%) 0.462(16.53%) 0.598(1.47%)
DBA 0.481(15.98%) 0.652(13.41%) 0.462(9.48%) 0.466(15.80%) 0.592(2.41%)
Q-Attack(FN) 0.313(45.27%) 0.637(15.45%) 0.423(17.14%) 0.487(11.93%) 0.603(0.72%)
Q-Attack(SOA) 0.483(15.63%) 0.279(62.98%) 0.488(4.50%) 0.463(16.22%) 0.499(17.72%)
Q-Attack(LOU) 0.451(21.23%) 0.643(14.64%) 0.382(25.17%) 0.497(10.21%) 0.584(3.81%)
Table III: Analysis on the transferability of Q-Attack. Here, we use the adversarial networks, generated by Q-Attack on particular community detection algorithms, to attack other algorithms. And RA, CDA and DBA are adopted as references.

Iv-E Transferability of Q-Attack

As we can see, Q-Attack is typically based on the modularity which however depends on certain community detection algorithm. In other words, there is a threaten that the adversarial network designed by Q-Attack strategy to attack a given community detection algorithm may lose its effectiveness on attacking others. If this is true, Q-Attack may be of less practicability, since there are a large number of community detection algorithms in reality, any of which could be chosen to be applied, and we may not know which one it is. In order to address this concern, we design experiments to testify the transferability of Q-Attack, i.e., to see whether the adversarial network obtained from certain community detection algorithm can be still effective for some others.

Except for FN, SOA and LOU, we adopt another two community detection algorithms, including Infomap (INF) and LPA, which are not directly based on modularity . The results are presented in Table III. Again, for each case, we we mark the top two best results as bold. We find that the adversarial networks obtained by Q-Attack on FN, SOA or LOU seem still effective on other community detection algorithms to a certain extent, no matter whether they are modularity based or not. In fact, the Q-Attack based on FN, SOA or LOU is even more effective than heuristic strategies on attacking other community detection algorithms in most cases. These results suggest that our Q-Attack has relatively good transferability, and thus rewiring a small number of connections, guided by Q-Attack on a particular community detection algorithm, can successfully attack many others, making it feasible to protect the individual privacy against community detection.

V Conclusion

In this paper, we propose several strategies, including CDA, DBA and GA based Q-Attack, to attack a number of community detection algorithms for networks. We perform the experiments on two well-known social networks, and find that all of these strategies are effective to make the community detection algorithms fail partly, in terms of decreasing modularity and NMI, by just disturbing a small number of connections. By comparison, Q-Attack behaves much better than CDA and DBA in all the considered white-box cases, where the target community detection algorithms are known in advance. Moreover, this attack strategy also presents certain transferability to make effective black-box attacks. That is, the adversarial networks, generated by Q-Attack on particular community detection algorithms, can still be used to effectively attack many others. These results indicate that it is feasible to utilize Q-Attack to protect the individual privacy against community detection in reality.

With more and more attentions are focused on the problem of information over-mined in social networks, many works are left to be studied. For instance, the community detection algorithms we considered in the experiments are mainly used to detect non-overlapping communities, but there are also many overlapping communities in the real world, our attack strategies thus can be extended to such more general situations. Besides, our Q-Attack method only takes modularity as the optimization objective due to its simplicity, therefore, methods based on multi-objective optimization are also worth studying and may present better attack effects.

Acknowledgment

The authors would like to thank all the members in our IVSN research group in Zhejiang University of Technology for the valuable discussion about the ideas and technical details presented in this paper.

References

  • [1] Q. Xuan, Z.-Y. Zhang, C. Fu, H.-X. Hu, and V. Filkov, “Social synchrony on complex networks,” IEEE transactions on cybernetics, vol. 48, no. 5, pp. 1420–1431, 2018.
  • [2] Q. Xuan, H. Fang, C. Fu, and V. Filkov, “Temporal motifs reveal collaboration patterns in online task-oriented networks,” Physical Review E, vol. 91, no. 5, p. 052813, 2015.
  • [3] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience, vol. 10, no. 3, p. 186, 2009.
  • [4] J. O. Garcia, A. Ashourvan, S. Muldoon, J. M. Vettel, and D. S. Bassett, “Applications of community detection techniques to brain graphs: Algorithmic considerations and implications for neural function,” Proceedings of the IEEE, vol. 106, no. 5, pp. 846–867, 2018.
  • [5] Z. Chen, J. Wu, Y. Xia, and X. Zhang, “Robustness of interdependent power grids and communication networks: A complex network perspective,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 1, pp. 115–119, 2018.
  • [6] S. Schiavo, J. Reyes, and G. Fagiolo, “International trade and financial integration: a weighted network analysis,” Quantitative Finance, vol. 10, no. 4, pp. 389–399, 2010.
  • [7] A. Lancichinetti and S. Fortunato, “Community detection algorithms: a comparative analysis,” Physical review E, vol. 80, no. 5, p. 056117, 2009.
  • [8] M. E. Newman, “The structure and function of complex networks,” SIAM review, vol. 45, no. 2, pp. 167–256, 2003.
  • [9] ——, “The structure of scientific collaboration networks,” Proceedings of the national academy of sciences, vol. 98, no. 2, pp. 404–409, 2001.
  • [10] P. M. Gleiser and L. Danon, “Community structure in jazz,” Advances in complex systems, vol. 6, no. 04, pp. 565–573, 2003.
  • [11] A. A. Abbasi and M. Younis, “A survey on clustering algorithms for wireless sensor networks,” Computer communications, vol. 30, no. 14-15, pp. 2826–2841, 2007.
  • [12] S. Fortunato and D. Hric, “Community detection in networks: A user guide,” Physics Reports, vol. 659, pp. 1–44, 2016.
  • [13] Q. Xuan, M. Zhou, Z.-Y. Zhang, C. Fu, Y. Xiang, Z. Wu, and V. Filkov, “Modern food foraging patterns: Geography and cuisine choices of restaurant patrons on yelp,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 508–517, 2018.
  • [14] C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, and Q. Xuan, “Link weight prediction using supervised learning methods and its application to yelp layered network,” IEEE Transactions on Knowledge and Data Engineering, 2018.
  • [15] S. Nagaraja, “The impact of unlinkability on adversarial community detection: effects and countermeasures,” in International Symposium on Privacy Enhancing Technologies Symposium.   Springer, 2010, pp. 253–272.
  • [16] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hiding individuals and communities in a social network,” Nature Human Behaviour, vol. 2, no. 2, p. 139, 2018.
  • [17] V. Fionda and G. Pirro, “Community deception or: How to stop fearing community detection algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 4, pp. 660–673, 2018.
  • [18] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerability of complex networks,” Physical review E, vol. 65, no. 5, p. 056109, 2002.
  • [19] M. Tasgin, A. Herdagdelen, and H. Bingol, “Community detection in complex networks using genetic algorithms,” arXiv preprint arXiv:0711.0491, 2007.
  • [20] H. Liu, X.-B. Hu, S. Yang, K. Zhang, and E. Di Paolo, “Application of complex network theory and genetic algorithm in airline route networks,” Transportation Research Record, vol. 2214, no. 1, pp. 50–58, 2011.
  • [21] S. Wang, H. Zou, Q. Sun, X. Zhu, and F. Yang, “Community detection via improved genetic algorithm in complex network,” Information Technology Journal, vol. 11, no. 3, pp. 384–387, 2012.
  • [22] M. Girvan and M. E. Newman, “Community structure in social and biological networks,” Proceedings of the national academy of sciences, vol. 99, no. 12, pp. 7821–7826, 2002.
  • [23] J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “E-mail as spectroscopy: Automated discovery of community structure within organizations,” The Information Society, vol. 21, no. 2, pp. 143–153, 2005.
  • [24] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, “Defining and identifying communities in networks,” Proceedings of the National Academy of Sciences, vol. 101, no. 9, pp. 2658–2663, 2004.
  • [25] M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
  • [26] A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Physical review E, vol. 70, no. 6, p. 066111, 2004.
  • [27] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.
  • [28] M. E. Newman, “Modularity and community structure in networks,” Proceedings of the national academy of sciences, vol. 103, no. 23, pp. 8577–8582, 2006.
  • [29] ——, “Finding community structure in networks using the eigenvectors of matrices,” Physical review E, vol. 74, no. 3, p. 036104, 2006.
  • [30] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Physical review E, vol. 76, no. 3, p. 036106, 2007.
  • [31] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proceedings of the National Academy of Sciences, vol. 105, no. 4, pp. 1118–1123, 2008.
  • [32] P. Ronhovde and Z. Nussinov, “Multiresolution community detection for megascale networks by information-based replica correlations,” Physical Review E, vol. 80, no. 1, p. 016109, 2009.
  • [33] M. Bellingeri, D. Cassi, and S. Vincenzi, “Efficiency of attack strategies on complex model and real-world networks,” Physica A: Statistical Mechanics and its Applications, vol. 414, pp. 174–180, 2014.
  • [34] B. Karrer, E. Levina, and M. E. Newman, “Robustness of community structure in networks,” Physical Review E, vol. 77, no. 4, p. 046119, 2008.
  • [35] S. Yu, M. Zhao, C. Fu, H. Huang, X. Shu, Q. Xuan, and G. Chen, “Target defense against link-prediction-based attacks via evolutionary perturbations,” arXiv preprint arXiv:1809.05912, 2018.
  • [36] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” arXiv preprint arXiv:1806.02371, 2018.
  • [37] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.   ACM, 2018, pp. 2847–2856.
  • [38] A. Bojchevski and S. Günnemann, “Adversarial attacks on node embeddings,” arXiv preprint arXiv:1809.01093, 2018.
  • [39] J. Chen, Y. Wu, X. Xu, Y. Chen, H. Zheng, and Q. Xuan, “Fast gradient attack on network embedding,” arXiv preprint arXiv:1809.02797, 2018.
  • [40] A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” science, vol. 286, no. 5439, pp. 509–512, 1999.
  • [41] J. Holland, “Adaptation in natural and artificial systems: an introductory analysis with application to biology,” Control and artificial intelligence, 1975.
  • [42] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” science, vol. 220, no. 4598, pp. 671–680, 1983.
  • [43] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on.   IEEE, 1995, pp. 39–43.
  • [44] W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of anthropological research, vol. 33, no. 4, pp. 452–473, 1977.
  • [45] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, “The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations,” Behavioral Ecology and Sociobiology, vol. 54, no. 4, pp. 396–405, 2003.
  • [46] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
  • [47] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, “Comparing community structure identification,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2005, no. 09, p. P09008, 2005.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
332253
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description