K-Core Minimization: A Game Theoretic Approach

K-Core Minimization: A Game Theoretic Approach

Sourav Medya University of California, Santa Barbara medya@cs.ucsb.edu Tiyani Ma University of California, Santa Barbara timtianyima@gmail.com Arlei Silva University of California, Santa Barbara arlei@cs.ucsb.edu  and  Ambuj Singh University of California, Santa Barbara ambuj@cs.ucsb.edu
Abstract.

-cores are maximal induced subgraphs where all vertices have degree at least . These dense patterns have applications in community detection, network visualization and protein function prediction. However, -cores can be quite unstable to network modifications, which motivates the question: How resilient is the k-core structure of a network, such as the Web or Facebook, to edge deletions? We investigate this question from an algorithmic perspective. More specifically, we study the problem of computing a small set of edges for which the removal minimizes the -core structure of a network.

This paper provides a comprehensive characterization of the hardness of the -core minimization problem (KCM), including innaproximability and fixed-parameter intractability. Motivated by such a challenge in terms of algorithm design, we propose a novel algorithm inspired by Shapley value—a cooperative game-theoretic concept— that is able to leverage the strong interdependencies in the effects of edge removals in the search space. As computing Shapley values is also NP-hard, we efficiently approximate them using a randomized algorithm with probabilistic guarantees. Our experiments, using several real datasets, show that the proposed algorithm outperforms competing solutions in terms of -core minimization while being able to handle large graphs. Moreover, we illustrate how KCM can be applied in the analysis of the -core resilience of networks.

K-core, Network Resilience, Graph Algorithms
copyright: rightsretaineddoi: 10.475/123_4isbn: 123-4567-24-567/08/06conference: ACM Woodstock conference; July 1997; El Paso, Texas USAjournalyear: 1997article: 4price: 15.00

1. Introduction

-cores play an important role in revealing the higher-order organization of networks. A -core (Seidman, 1983) is a maximal induced subgraph where all vertices have internal degree of at least . These cohesive subgraphs have been applied to model users’ engagement and viral marketing in social networks (Bhawalkar et al., 2015; Kitsak et al., 2010). Other applications include anomaly detection (Shin et al., 2016), community discovery (Peng et al., 2014), protein function prediction (You et al., 2013), and visualization (Alvarez-Hamelin et al., 2006; Carmi et al., 2007). However, the -core structure can be quite unstable under network modification. For instance, removing only a few edges from the graph might lead to the collapse of its core structure. This motivates the -core minimization problem: Given a graph G and constant k, find a small set of edges for which the removal minimizes the size of the k-core structure (Zhu et al., 2018).

We motivate -core minimization using the following applications: (1) Monitoring: Given an infrastructure or technological network, which edges should be monitored for attacks (Xiangyu et al., 2013; Laishram et al., 2018)? (2) Defense: Which communication channels should be blocked in a terrorist network in order to destabilize its activities (Pedahzur and Perliger, 2006; Perliger and Pedahzur, 2011)? and (3) Design: How to prevent unraveling in a social or biological network by strengthening connections between nodes (Bhawalkar et al., 2015; Morone et al., 2018)?

Consider a specific application of -cores to online social social networks (OSNs). OSN users tend to perform activities (e.g., joining a group, playing a game) if enough of their friends do the same (Burke et al., 2009). Thus, strengthening critical links between users is key to the long-term popularity, and even survival, of the network (Farzan et al., 2011). This scenario can be modeled using -cores. Initially, everyone is engaged in the -core. Removal of a few links (e.g., unfriending, unfollowing) might not only cause a couple of users to leave the network but produce a mass exodus due to cascading effects. This process can help us to understand the decline and death of OSNs such as Friendster (Garcia et al., 2013).

(a) Initial
(b) Modification
(c) Modification
Figure 1. K-core minimization for an illustrative example: (a) Initial graph, where all the vertices are in the -core; (b) Removing causes all the vertices to leave the -core; (c) Removing causes only six vertices to leave the -core.

-core minimization (KCM) can be motivated both from the perspective of a centralized agent who protects the structure of a network or an adversary that aims to disrupt it. Moreover, our problem can also be applied to measure network resilience (Laishram et al., 2018) .

We illustrate KCM in Figure 1. An initial graph (Figure (a)a), where all vertices are in the -core, is modified by the removal of a single edge. Graphs (Figure (b)b) and (Figure (b)b) are the result of removing and , respectively. While the removal of brings all the vertices into a -core, deleting has a smaller effect—four vertices remain in the 3-core. Our goal is to identify a small set of edges removal of which minimizes the size of the -core.

From a theoretical standpoint, for any objective function of interest, we can define a search (e.g. the -core decomposition) and a corresponding modification problem, such as -core minimization. In this paper, we show that, different from its search version (Batagelj and Zaveršnik, 2011), KCM is NP-hard. Furthermore, there is no polynomial time algorithm that achieves a constant-factor approximation for our problem. Intuitively, the main challenge stems from the strong combinatorial nature of the effects of edge removals. While removing a single edge may have no immediate effect, the deletion of a small number of edges might cause the collapse of the k-core structure. This behavior differs from more popular problems in graph combinatorial optimization, such as submodular optimization, where a simple greedy algorithm provides constant-factor approximation guarantees.

The algorithm for -core minimization proposed in this paper applies the concept of Shapley values (SVs), which, in the context of cooperative game theory, measure the contribution of players in coalitions (Shapley, 1953). Our algorithm selects edges with largest Shapley value to account for the joint effect (or cooperation) of multiple edges. Since computing SVs is NP-hard, we approximate them in polynomial time via a randomized algorithm with quality guarantees.

Recent papers have introduced the KCM problem (Zhu et al., 2018) and its vertex version (Zhang et al., 2017), where the goal is to delete a few vertices such that the -core structure is minimized. However, our work provides a stronger theoretical analysis and more effective algorithms that can be applied to both problems. In particular, we show that our algorithm outperforms the greedy approach proposed in (Zhu et al., 2018).

Our main contributions are summarized as follows:

  • We study the -core minimization (KCM) problem, which consists of finding a small set of edges, removal of which minimizes the size of the -core structure of a network.

  • We show that KCM is NP-hard, even to approximate by a constant for . We also discuss the parameterized complexity of KCM and show the problem is -hard for the same values of .

  • Given the above inapproximability result, we propose a randomized Shapley Value based algorithm that efficiently accounts for the interdependence among the candidate edges for removal.

  • We show that our algorithm is both accurate and efficient using several datasets. Moreover, we illustrate how KCM can be applied to profile the structural resilience of real networks.

2. Problem Definition

We assume to be an undirected and unweighted graph with sets of vertices () and edges (). Let denote the degree of vertex in . An induced subgraph, is the following: if and then . The -core (Seidman, 1983) of a network is defined below.

TheoremDefinition 0 ().

-Core: The -core of a graph , denoted by , is defined as a maximal induced subgraph that has vertices with degree at least .

Figure 2 shows an example. The graphs in Figures (b)b and (c)c are the -core and the -core, respectively, of the initial graph in Figure (a)a. Note that, is a subgraph of . Let denote the core number of the node in . If and then . -core decomposition can be performed in time by recursively removing vertices with degree lower than (Batagelj and Zaveršnik, 2011).

Let be the modified graph after deleting a set with edges. Deleting an edge reduces the degree of two vertices and possibly their core numbers. The reduction in core number might propagate to other vertices. For instance, the vertices in a simple cycle are in the -core but deleting any edge from the graph moves all the vertices to the -core. Let and be the number of nodes and edges respectively in .

Symbols Definitions and Descriptions
Given graph (vertex set and edge set )
Number of nodes in the graph
Number of edges in the graph
The -core of graph
, nodes in the -core of
, edges in the -core of
Candidate set of edges
Budget
The value of a coalition
The Shapley value of an edge
Set of edges before in permutation
Table 1. Frequently used symbols
TheoremDefinition 0 ().

Reduced -Core: A reduced -core, is the -core in , where .

TheoremExample 0 ().

Figures (a)a and (b)b show an initial graph, and modified graph (where ) respectively. In , all the nodes are in the -core. Deleting brings the vertices and to the -core and thus and also go to the -core.

TheoremDefinition 0 ().

-Core Minimization (KCM): Given a candidate edge set , find the set, of edges to be removed such that is minimized, or, is maximized.

TheoremExample 0 ().

Figures (a)a shows an initial graph, , where all the nodes are in the -core. Deleting and brings all the vertices to the -core, whereas deleting and has no effect on the -core structure (assuming .

Clearly, the importance of the edges varies in affecting the -core upon their removal. Next, we discuss strong inapproximability results for the KCM problem along with its parameterized complexity.

(a) Initial graph,
(b) The -core of
(c) The -core of
Figure 2. Examples of (a) a graph ; (b) its -core; and (c) its -core structures.
(a) Initial,
(b) Modified,
Figure 3. Example of the changes in the core structure via deletion of an edge: (a) All the nodes are in the -core. (b) In the modified graph, the nodes are in the -core.

2.1. Hardness and Approximability

The hardness of the KCM problem stems from two major facts: 1) There is a combinatorial number of choices of edges from the candidate set, and 2) there might be strong dependencies in the effects of edge removals (e.g. no effect for a single edge but cascading effects for subsets of edges). We show that KCM is NP-hard to approximate within any constant factor for .

TheoremTheorem 1 ().

The KCM problem is NP-hard for and .

Proof.

For both values of , the reduction is from 2-MINSAT (Kohli et al., 1994). Details on this proof are given in the Appendix. ∎

TheoremTheorem 2 ().

The KCM problem is NP-hard and it is also NP-hard to approximate within a constant-factor for all .

Proof.

We sketch the proof for (similar for ).

Let be an instance of the Set Union Knapsack Problem (Goldschmidt et al., 1994), where is a set of items, is a set of subsets (), is a subset profit function, is an item weight function, and is the budget. For a subset , the weighted union of set is and . The problem is to find a subset such that and is maximized. SK is NP-hard to approximate within a constant factor (Arulselvan, 2014).

We reduce a version of with equal profits and weights (also NP-hard to approximate) to the KCM problem. The graph is constructed as follows. For each , we create a cycle of vertices in and add , as edges. We also add vertices to with eight edges where the four vertices to form a clique with six edges. The other two edges are and . Moreover, for each subset we create five vertices, to and add eight edges as in . In the edge set , an edge will be added if . Additionally, if , the edge will be added to . Figure 4 illustrates our construction for a set .

In KCM, the number of edges to be removed is the budget, . The candidate set of edges, is the set of all the edges with form (the dotted edges in Fig. 4). Initially all the nodes in are in the -core. Our claim is, for any solution of an instance of the mentioned there is a corresponding solution set of edges, (where ) in of the KCM problem, such that if the edges in are removed.

The nodes in any and the node will be in the -core if the edge gets removed. So, removal of any edges from enforces nodes to go to the -core. But each will be in the -core iff all its neighbours in s go to the -core after the removal of edges in . This proves our claim. ∎

Theorem 2 shows that there is no polynomial-time constant-factor approximation for KCM when . This contrasts with well-known NP-hard graph combinatorial problems in the literature (Kempe et al., 2003). In the next section, we explore the hardness of our problem further in terms of exact exponential algorithms with respect to the parameters.

Figure 4. Example construction for hardness reduction from SK where .

2.2. Parameterized Complexity

There are several NP-hard problems with exact solutions via algorithms that run in exponential time in the size of the parameter. For instance, the NP-hard Vertex Cover can be solved via an exhaustive search algorithm in time (Balasubramanian et al., 1998), where and are budget and the size of the graph instance respectively. Vertex cover is therefore fixed-parameter tractable (FPT), and if we are only interested in small , we can solve the problem in polynomial time. We investigate whether the KCM problem is also in the FPT class.

A parameterized problem instance is comprised of an instance in the usual sense, and a parameter . A problem with parameter is called fixed parameter tractable (FPT) (Flum and Grohe, 2006) if it is solvable in time , where is an arbitrary function of and is a polynomial in the input size . Just as in NP-hardness, there exists a hierarchy of complexity classes above FPT. Being hard for one of these classes is an evidence that the problem is unlikely to be FPT. Indeed, assuming the Exponential Time Hypothesis, a problem which is -hard does not belong to FPT. The main classes in this hierarchy are: FPT. Generally speaking, the problem is harder when it belongs to a higher -hard class in terms of the parameterized complexity. For instance, dominating set is in and is considered to be harder than maximum independent set, which is in .

TheoremDefinition 0 ().

Parameterized Reduction (Flum and Grohe, 2006): Let and be parameterized problems. A parameterized reduction from to is an algorithm that, given an instance of , outputs an instance of such that: (1) is a yes-instance of iff is a yes-instance of ; (2) for some computable (possibly exponential) function ; and (3) the running time of the algorithm is for a computable function .

TheoremTheorem 3 ().

The KCM problem is not in FPT, in fact, it is in parameterized by for .

Proof.

We show a parameterized reduction from the Set Cover problem. The Set Cover problem is known to be -hard (Bonnet et al., 2016). The details on the proof are given in the Appendix. ∎

Motivated by these strong hardness and inapproximability results, we next consider some practical heuristics for the KCM problem.

3. Algorithms

According to Theorems 2 and 3, an optimal solution— or constant-factor approximation—for -core minimization requires enumerating all possible size- subsets from the candidate edge set, assuming . In this section, we propose efficient heuristics for KCM.

3.1. Greedy Cut

For KCM, we only need to consider the current -core of the graph, (where ,). The remaining nodes in will already be in a lower-than--core and can be removed. We define a vulnerable set as those nodes that would be demoted to a lower-than--core if edge is deleted from the current core graph . Algorithm 1 (GC) is a greedy approach for selecting an edge set () that maximizes the -core reduction, . In each step, it chooses the edge that maximizes (step ) among the candidate edges . The specific procedure for computing (step ) and its running time () are described in Appendix. The overall running time of GC is .

Local Update (Algorithm 2): After the removal of the best edge in each step, the current graph needs to be updated (step ). Recomputing the cores inside would take time. Instead, a more efficient approach is to update only the affected region after deleting the edge . If an edge is deleted, will be removed if (the same for ). This triggers a cascade of node removals (with the associated edges). Let be a set of nodes already removed from that are neighbours of node . We observe that will be removed if .

3.2. Shapley Value Based Algorithm

The greedy algorithm discussed in the last section is unaware of some dependencies between the candidates in the solution set. For instance, in Figure (a)a, all the edges have same importance (the value is ) to destroy the -core structure. In this scenario, GC will choose an edge arbitrarily. However, removing an optimal set of seven edges can make the graph a tree (-core). To capture these dependencies, we adopt a cooperative game theoretic concept named Shapley Value (Shapley, 1953). Our goal is to make a coalition of edges (players) and divide the total gain by this coalition equally among the edges inside it.

Input:
Output: : Set of edges to delete
1
2 while  do
3      
4      
5       LocalUpdate
6      
return
Algorithm 1 Greedy Cut (GC)
Input:
1 Remove and update
2 ,
3 if  then
4       Queue ,
5if  then
6       Queue ,
7while  do
8       Remove form for  do
9            
10             if  then
11                   Add to ,
12            
13      if  then
14             Remove from
15      
Algorithm 2 LocalUpdate

3.2.1. Shapley Value

The Shapley value of an edge in the context of KCM is defined as follows. Let the value of a coalition be . Given an edge and a subset such that , the marginal contribution of to is:

(1)

Let be the set of all permutations of all the edges in and be the set of all the edges that appear before in a permutation . The Shapley value of the average of its marginal contributions to the edge set that appears before in all the permutations:

(2)

Shapley values capture the importance of an edge inside a set (or coalition) of edges. However, computing Shapley value requires considering permutations. Next we show how to efficiently approximate the Shapley value for each edge via sampling.

3.2.2. Approximate Shapley Value Based Algorithm

Algorithm 3 (Shapley Value Based Cut, SV) selects the best edges according to their approximate Shapley values based on a sampled set of permutations, . For each permutation in , we compute the marginal gains of all the edges. These marginal gains are normalized by the sample size, . In terms of time complexity, steps 4-6 are the dominating steps and take time, where and are the number of nodes and edges in , respectively.

Input:
Output: : Set of edges to delete
1 Initialize all as ,
2 Generate random permutations of edges
3
4 for  do
5       for  do
6            
7      
8,
9 Select top edges from
return
Algorithm 3 Shapley Value Based Cut (SV)

3.2.3. Analysis

In the previous section, we presented a fast sampling algorithm (SV) for -core minimization using Shapley values. Here, we study the quality of the approximation provided by SV as a function of the number of samples. We show that our algorithm is nearly optimal with respect to each Shapley value with high probability. More specifically, given and , SV takes samples, where is a polynomial in , to approximate the Shapley values within error with probability .

We sample. uniformly with replacement, a set of permutations () from the set of all permutations, . Each permutation is chosen with probability . Let be the approximate Shapley value of based on . is a random variable that denotes the marginal gain in the -th sampled permutation. So, the estimated Shapley value is . Note that .

TheoremTheorem 3.1 ().

Given , a positive integer , and a sample of independent permutations , where ; then :

where denotes the number of nodes in .

Proof.

We start by analyzing the Shapley value of one edge. Because the samples provide an unbiased estimate and are i.i.d., we can apply Hoeffding’s inequality (Hoeffding, 1963) to bound the error for edge :

(3)

where , , and each is strictly bounded by the intervals . Let be the maximum gain for in any permutation. Then, , as for any the minimum and maximum values are and respectively. As a consequence:

Thus, the following holds for each edge :

Using the above equation we compute a joint sample bound for all edges . Let and be the event that . So, . Similarly, one can prove that , where , as .

Applying union bound (), for all edges in , i.e., , we get that:

By choosing , ,

This ends the proof. ∎

Next, we apply Theorem 3.1 to analyze the quality of a set produced by Algorithm 3 (SV), compared with the result of an exact algorithm (without sampling). Let the exact Shapley values of top edges be where . The set produced by Algorithm 3 (SV) has Shapley values, where . We can prove the following result regarding the SV algorithm.

TheoremCorollary 4 ().

For any and , , positive integer , and a sample of independent permutations , where :

where denotes the number of nodes in .

Proof.

For all edges , Theorem 3.1 shows that . So, with probability , and . As , with the same probability. ∎

At this point, it is relevant to revisit the hardness of approximation result from Theorem 2 in the light of Corollary 4. First, SV does not directly minimize the KCM objective function (see Definition 2.4). Instead, it provides a score for each candidate edge based on how different permutations of edges including minimize the KCM objective under the assumption that such scores are divided fairly among the involved edges. Notice that such assumption is not part of the KCM problem, and thus Shapley values play the role of a heuristic. Corollary 4, which is a polynomial-time randomized approximation scheme (PRAS) type of guarantee instead of a constant-factor approximation, refers to the exact Shapley value of the top edges, and not the KCM objective function. We evaluate how SV performs regarding the KCM objective in our experiments.

3.2.4. Generalizations

Sampling-based approximate Shapley values can also be applied to other relevant combinatorial problems on graphs for which the objective function is not submodular. Examples of these problems include -core anchoring (Bhawalkar et al., 2015), influence minimization (Kimura et al., 2008), and network design (Dilkina et al., 2011)).

3.3. Optimizations for GC and SV

We briefly discuss optimizations for the Greedy (GC) and Shapley Value based (SV) algorithms introduced in this section. The objective is to reduce the number of evaluations of candidates edges in GC and SV via pruning. To achieve this goal, we introduce the concept of edge dominance. Let be the set of vertices that will be removed if is deleted from due to the -core constraint. If is dominated by , then . We can skip the evaluation of whenever it appears after among candidate edges.

The concept of edge dominance is applied to speedup both GC and SV. In GC, we do not compute the marginal gain of any edge that is dominated by a previously computed edge. For SV, we only consider non-dominated edges in a permutation. A more detailed discussion of these pruning schemes is provided in the Appendix. Notice that these optimizations do not affect the output of the algorithms. We evaluate the performance gains due to pruning in our evaluation.

Dataset Name Type
Yeast 1K 2.6K 6 Biological
Human 3.6K 8.9K 8 Biological
email-Enron (EE) 36K 183K 42 Email
Facebook (FB) 60K 1.5M 52 OSN
web-Stanford (WS) 280K 2.3M 70 Webgaph
DBLP (DB) 317K 1M 113 Co-authorship
com-Amazon (CA) 335K 926K 6 Co-purchasing
Erdos-Renyi (ER) 60K 800K 19 Synthetic
Table 2. Dataset descriptions and statistics. The value of (or degeneracy) is the largest among all the values of for which there is a -core in the graph.

4. Experiments

In this section, we evaluate the algorithms for k-core minimization proposed in this paper—Greedy (GC) and Shapley Value Based Cut (SV)—against baseline solutions using several large-scale graphs. Sections 4.2 and 4.3 are focused on the quality results (k-core minimization) and the running time of the algorithms, respectively. Moreover, in Section 4.4, we show how k-core minimization can be applied in the analysis of the structural resilience of networks.

4.1. Experimental Setup

All the experiments were conducted on a GHz Intel Core i7-4720HQ machine with GB RAM running Windows 10. Algorithms were implemented in Java. The source-code of our implementations will be made open-source once this paper is accepted.

Datasets: The real datasets used in our experiments are available online and are mostly from SNAP111https://snap.stanford.edu. The Human and Yeast datasets are available in (Moser et al., 2009). In these datasets the nodes and the edges correspond to genes and interactions (protein- protein and genetic interactions) respectively. The Facebook dataset is from (Viswanath et al., 2009). Table 2 shows dataset statistics, including the largest k-core (a.k.a. degeneracy). These are undirected and unweighted graphs from various applications: EE is from email communication; FB is an online social network, WS is a Web graph, DB is a collaboration network and CA is a product co-purchasing network. We also apply a random graph (ER) generated using the Erdos-Renyi model.

Algorithms: Greedy Cut (GC) and Shapley Value Based Cut (SV) are algorithms proposed in Sections 3.1 and 3.2, respectively. We also consider three baselines in our experiments. Low Jaccard Coefficient (JD) removes the edges with lowest Jaccard coefficient. Similarly, Low-Degree (LD) deletes edges for which adjacent vertices have the lowest degree. We also apply Random (RD), which simply deletes edges from the candidate set uniformly at random.

Quality evaluation metric: We apply the percentage of vertices from the initial graph that leave the -core after the deletion of a set of edges (produced by a KCM algorithm):

(4)

Default parameters: We set the candidate edge set to those edges () between vertices in the k-core . Unless stated otherwise, the value of the approximation parameter for SV () is and the number samples applied is (see Theorem 3.1).

(a) DB
(b) WS
(c) EE
(d) FB
(e) FB
(f) WS
(g) FB
(h) WS
Figure 5. K-core minimization (DN(%)) for different algorithms varying (a-d) the number of edges in the budget; (e-f) the core parameter ; (g-h) and the sampling error . Some combinations of experiments and datasets are omitted due to space limitations, but those results are consistent with the ones presented here. The Shapley Value based Cut (SV) algorithm outperforms the best baseline (LD) by up to 6 times. On the other hand, the Greedy approach (GC) achieves worse results than the baselines, with the exception of RD, in most of the settings. SV error increases smoothly with and LD becomes a good alternative for large values of .

4.2. Quality Evaluation

KCM algorithms are compared in terms of quality (DN(%)) for varying budget (), core value , and the error of the sampling scheme applied by the SV algorithm ().

Varying budget (b): Figure 5 presents the k-core minimization results for —similar results were found for —using four different datasets. SV outperforms the best baseline by up to six times. This is due to the fact that our algorithm can capture strong dependencies among sets of edges that are effective at breaking the k-core structure. On the other hand, GC, which takes into account only marginal gains for individual edges, achieves worse results than simple baselines such as JD and LD. We also compare SV and the optimal algorithm in small graphs and show that SV produces near-optimal results (see the Appendix).

Varying core value (k): We evaluate the impact of over quality for the algorithms using two datasets (FB and WS) in Figures (e)e and (f)f. The budget () is set to . As in the previous experiments, SV outperforms the competing approaches. However, notice that the gap between LD (the best baseline) and SV decreases as increases. This is due to the fact that the number of samples decreases for higher as the number of candidate edge also decreases, but it can be mended by a smaller . Also, a larger will increase the level of dependency between candidate edges, which in turn makes it harder to isolate the impact of a single edge—e.g. independent edges are the easiest to evaluate. On the other hand, a large value of leads to a less stable k-core structure that can often be broken by the removal of edges with low-degree endpoints. LD is a good alternative for such extreme scenarios. Similar results were found for other datasets.

Varying the sampling error (): The parameter controls the the sampling error of the SV algorithm according to Theorem 3.1. We show the effect of over the quality results for FB and WS in Figures (g)g and (h)h. The values of and are set to and respectively. The performance of the competing algorithms do not depend on such parameter and thus remain constant. As expected, DN(%) is inversely proportional to the value of for SV. The trade-off between and the running time of our algorithm enables both accurate and efficient selection of edges for k-core minimization.

4.3. Running Time

Here, we evaluate the running time of the GC and SV algorithms. In particular, we are interested in measuring the performance gains due to the pruning strategies described in Section 3.3. LD and JD do not achieve good quality results in general, as discussed in the previous section, thus we omit them from this evaluation.

Running times for SV varying the sampling error () and the core parameter () using the FB dataset are given in Figures (a)a and (b)b, respectively. Even for small error, the algorithm is able to process graphs with tens of thousands of vertices and millions of edges in, roughly, one minute. Running times decay as increases due to two factors: (1) the size of the -core structure decreases (2) pruning gets boosted by a less stable core structure.

(a) Varying
(b) Varying
(c) Pruning, GC
(d) Pruning, SV
Figure 6. Running times by SV using FB while varying (a) the sampling error and (b) the core parameter ; and (c-d) impact of pruning for GC and SV algorithms using three datasets. SV is efficient even for small values of sampling error and its running time decreases with . GC is up to one order of magnitude faster with pruning, while SV is up to 50% faster.
(a) DB
(b) WS
(c) FB
(d) ER
Figure 7. Core resilience for four different networks: (a) DB (co-authorship), (b) WS (Webgraph), (c) FB (social), (d) ER (random). ER and DB are the most and least stable networks, respectively. Tipping points are found for ER and DB.

In Figures (c)c and (d)d, we look further into the effect of pruning for GC and SV by comparing versions of the algorithms with and without pruning using three datasets. GC becomes one order of magnitude faster using our optimization. Gains for SV are lower but still significant (up to 50%). We found in other experiments that the impact of pruning for SV increases with the budget, which is due to the larger number of permutations to be considered by the algorithm.

4.4. Application: -core Resilience

We show how KCM can be applied to profile the resilience or stability of real networks. A profile provides a visualization of the resilience of the -core structure of a network for different combinations of and budget. We apply (Equation 4) as a measure of the percentage of the -core removed by a certain amount of budget—relative to the immediately smaller budget value.

Figure 7 shows the results for four networks: co-authorship (DB), Web (WS), social network (FB) and a random (ER) graph. We also discuss profiles for Human and Yeast in the Appendix. Each cell corresponds to a given - combination and the color of cell shows the difference in between and for . As colors are relative, we also show the range of values associated to the the color scheme.

This is a summary of our main findings:

Stability: ER (Figure (d)d) is the most stable graph, as can be noticed by the range of values in the profile. The majority of nodes in ER are in the -core. DB (Figure (a)a) is the least stable, but only when , which is due to its large number of small cliques. The high-core structure of DB is quite unstable, with less than % of the network in the -core structure after the removal of edges.

Tipping points: We also look at large effects of edge removals within small variations in budget—for a fixed value of . Such a behavior is not noticed for FB and WS (Figures (b)b and (c)c, respectively), for which profiles are quite smooth. This is mostly due to the presence of fringe nodes at different levels of -core structure. On the other hand, ER produced the most prominent tipping points ( and ). This pattern is also found for DB.

5. Previous Work

-core computation and applications: A -core decomposition algorithm was first introduced by Seidman (Seidman, 1983). A more efficient solution—with time complexity —was presented by Batagelj et al. (Batagelj and Zaveršnik, 2011) and its distributed version was proposed in (Montresor et al., 2013). Sariyuce et al. (Saríyüce et al., 2013) proposed algorithms -core decomposition in streaming data. For the case of uncertain graphs, where edges have probabilities, Bonchi et al. (Bonnet et al., 2016) introduced efficient algorithms for the problem. The -core decomposition has been used in many applications. -cores are often applied in the analysis and visualization of large scale complex networks (Alvarez-Hamelin et al., 2006). Other applications include clustering and community detection (Giatsidis et al., 2014), characterizing the Internet topology (Carmi et al., 2007), and analyzing the structure of software systems (Zhang et al., 2010). In social networks, -cores are usually associated with models for user engagement. Bhawalkar et al. (Bhawalkar et al., 2015) studied the problem of increasing the size of -core by anchoring a few vertices initially outside of the -core. Chitnis et al. (Chitnis et al., 2013) proved stronger inapproximation results for the anchoring problem. Malliaros et al. (Malliaros and Vazirgiannis, 2013) investigated user engagement dynamics via -core decomposition.

Network Resilience/Robustness: Understanding the behavior of a complex system (e.g. the Internet, the power grid) under different types of attacks and failures has been a popular topic of study in network science (Callaway et al., 2000; Albert et al., 2004; Cohen et al., 2000). This line of work is mostly focused on non-trivial properties of network models, such as critical thresholds and phase transitions, assuming random or simple targeted modifications. Najjar et al. (Najjar and Gaudiot, 1990) and Smith et al. (Smith et al., 2011) apply graph theory to evaluate the resilience of computer systems, specially communication networks. An overview of different graph metrics for assessing robustness/resilience is given by (Ellens and Kooij, 2013). Malliaros et al. (Malliaros et al., 2012) proposed an efficient algorithm for computing network robustness based on spectral graph theory. The appropriate model for assessing network resilience and robustness depends on the application scenario and comparing different such models is not the focus of our work.

Stability/resilience of -core: Adiga et al. (Adiga and Vullikanti, 2013) studied the stability of high cores in noisy networks. Laishram et al. (Laishram et al., 2018) recently introduced a notion of resilience in terms of the stability of -cores against deletion of random nodes/edges. If the rank correlation of core numbers before and after the removal is high, the network is core-resilient. They also provided an algorithm to increase resilience via edge addition. Notice that this is different from our problem, as we search for edges that can destroy the stability of the -core. Another related paper is the work by Zhang et al. (Zhang et al., 2017). Their goal is to find vertices such that their deletion reduces the -core maximally. Like in our setting, minimizing the -core via edge deletions has been studied recently by Zhu et al. (Zhu et al., 2018). However, we show stronger inapproximability (both in traditional hardness as well as parameterized complexity setting) results. We further provide stronger algorithmic contribution via Shapley value and randomization that outperforms the methods in (Zhu et al., 2018).

Shapley Value (SV) and combinatorial problems: A Shapley value based algorithm was previously introduced for influence maximization (IM) (Narayanam and Narahari, 2011). However, IM can be approximated within a constant-factor by a simple greedy algorithm due to the submodular property (Kempe et al., 2003). In this paper, we use Shapley value to account for the joint effect of multiple edges in the solution of the KCM problem, for which we have shown stronger inapproximability results.

Other network modification problems: A set of network modification problems based on vertex upgrades to improve the delays on adjacent edges were introduced by Paik et al. (Paik and Sahni, 1995). These problems have since then attracted a significant amount of attention. Meyerson et al. (Meyerson and Tagiku, 2009) designed algorithms for the minimization of shortest path distances. Faster algorithms for the same problem were proposed in (Papagelis et al., 2011; Parotsidis et al., 2015). Demaine et al. (Demaine and Zadimoghaddam, 2010) studied the minimization of the diameter of a network and node eccentricity by adding shortcut edges. Recently, Lin et al. (Lin and Mouratidis, 2015) addressed the shortest path distance optimization problem via improving edge weights on undirected graphs. A node version of the problem has also been studied (Dilkina et al., 2011; Medya et al., 2016). Another related problem is to optimize node centrality by adding edges (Crescenzi et al., 2015; Ishakian et al., 2012). More examples include boosting or containing diffusion processes in networks. These were studied under different well-known diffusion models such as SIR (Tong et al., 2012), Linear Threshold (Khalil et al., 2014; Kuhlman et al., 2013) and Independent Cascade (Kimura et al., 2008; Bogunovic, 2012; Chaoji et al., 2012; Lin et al., 2017).

6. Conclusion

We have studied the -core minimization (KCM) problem, which consists of finding a set of edges, removal of which minimizes the size of the -core structure. KCM was shown to be NP-hard, even to approximate within any constant when . The problem is also not fixed-parameter tractable, meaning it cannot be solved efficiently even if the number of edges deleted is small. Given such inapproximability results, we have proposed an efficient randomized heuristic based on Shapley value to account for the interdependence in the impact of candidate edges. For the sake of comparison, we also propose a simpler greedy algorithm, which cannot assess such strong dependencies in the effects of edge deletions.

We have evaluated our algorithms using several real graphs and shown that the Shapley value based approach outperforms competing solutions in terms of quality. The proposed algorithm is also efficient, enabling its application to graphs with hundreds of thousands of vertices and millions of edges in time in the order of minutes using a desktop PC. We have also illustrated how KCM can be used for profiling the resilience of networks to edge deletions.

7. Appendix

7.1. Proof for Theorem 1

Proof.

First, we sketch the proof for . Consider an instance of the NP-hard 2-MINSAT (Kohli et al., 1994) problem which is defined by a set of variables and collection of clauses. Each clause has two literals (). So, each is of the form where is a literal and is either a variable or its negation. The problem is to decide whether there exists a truth assignment in that satisfies no more than clauses in . To define a corresponding KCM instance, we construct the graph as follows.

We create a vertex for each clause . The result is a set of vertices. For each variable , we create two vertices: one for the variable () and another for its negation (). Thus, a total of vertices, are produced. Moreover, whenever the literal , we add two edges, and to .

For , KCM consists of removing edges while maximizing the number of isolated vertices (-core, ). One can think of an edge in the KCM instance as a vertex in . Each vertex is connected to exactly two vertices (end points of the edge in the KCM instance) in the set . Satisfying a clause is equivalent to removing the corresponding vertex (deleting the edge in KCM) from . A vertex in will be isolated when all of its associated clauses (or vertices) in are satisfied (removed). If there is a truth assignment which satisfies no more than clauses in 2-MINSAT, that implies vertices can be isolated in by removing vertices (or deleting edges in KCM). If there is none, then vertices cannot be isolated by breaking edges in KCM.

To prove NP-hardness for , we can transform the version of the problem to the one. The transformation is very similar to the one described in (Zhang et al., 2017), and thus is omitted here. ∎

7.2. Proof for Theorem 3

Proof.

We sketch the proof for . A similar construction can be applied for the case of .

Consider an instance of the -hard Set Cover (Bonnet et al., 2016) problem, defined by a collection of subsets from a universal set of items . The problem is to decide whether there exist subsets whose union is . To define a corresponding KCM instance, we construct the graph as follows.

For each subset we create a cycle of vertices in . Edges are the added. We also add vertices to with eight edges where the four vertices to form a clique with six edges. The other two edges are . Moreover, for each , we create a cycle of vertices in . The added edges are . We also add vertices to with eight edges where the four vertices to form a clique with six edges. The other two edges are . Furthermore, edge will be added to if . Additionally, if the edges and will be added to . Clearly the reduction is in FPT. The candidate set of edges is . Figure 8 illustrates the structure of our construction for sets and .

Initially all nodes are in the -core. We claim that a set , with , is a cover iff where . Note that for any , if is removed, the nodes and node go to the -core. Moreover, if , then the nodes and node go to the -core after is removed. Now, if is a set cover, all the s will be in some and nodes will go into -core; so —any edges from would remove nodes. On the other hand, assume that after removing edges in . The only way to have nodes removed from corresponding is if and . Thus, nodes will be removed, making a set cover. This proves our claim. ∎

Figure 8. Example construction for parameterized hardness from Set Cover where .

7.3. Algorithm 4

This procedure computes the vulnerable set—i.e., the set of nodes that will leave the -core upon deletion of the edge from . The size of the set is essentially the marginal gain of deleting . If is deleted, will be removed iff (the same for ). This triggers a cascade of node removals from the -core (with the associated edges). Let be the set of nodes already removed from that are neighbours of node . We observe that will be removed if . Note that the procedure is similar to Algorithm 2 (LocalUpdate), having running time.

Input:
Output:
1 if  then
2       Queue ,
3      
4if  then
5       Queue ,
6      
7while   do
8       Remove form
9       for  do
10            
11             if  then
12                   Add to ,
13            
14      
return
Algorithm 4 computeVS

7.4. Optimizations for GC and SV

Here, we discuss optimizations for the Greedy (GC) and Shapley Value based (SV) algorithms introduced in Section 3.3. We propose a general pruning technique to speed up both Algorithms 1 and 3 (GC and SV). For GC, in each step, all the candidate edges are evaluated (step ). How can we reduce the number of evaluations in a single step? In SV, in a single permutation, marginal gains are computed for all the candidate edges (step ). How can we skip edges that have marginal gain?. We introduce a concept of edge dominance. Let be the set of vertices that would be removed if is deleted from due to the -core constraint. If has one of the end points or in , then is dominated by .

TheoremObservation 1 ().

If is dominated by , then .

In Algorithm 1 (GC), while evaluating each edge in the candidate set (step ) if comes after , then the evaluation of can be skipped, as (Observation 1). Consider the graph in Figure (a)a as an example with the candidate set containing all edges. Initially, all the nodes are in the -core. To break the -core, the edge is dominated by the edge as . In Algorithm 3 (SV), while computing the marginal gain of each edge in a coalition for a particular permutation , assume that appear after . As and using Observation 1, . Thus, the computation of the marginal gain of can be skipped.

For SV, we consider only non-dominated edges in a permutation and normalize their contribution by their number of appearances in the sampled set. For SV, we only consider non-dominated edges in a permutation. Notice that these optimizations do not affect the output of the algorithms. We evaluate the performance gains due t