Approximation Algorithms for Reducing the Spectral Radius To Control Epidemic Spread

Approximation Algorithms for Reducing the Spectral Radius To Control Epidemic Spread

Sudip Saha 
NDSSL, Virginia Bioinformatics Institute, Virginia Tech.Department of Computer Science, Virginia Tech.
Email:{ssaha, abhijin, akumar}@vbi.vt.edu, badityap@cs.vt.edu
   Abhijin Adiga11footnotemark: 1
   B. Aditya Prakash22footnotemark: 2
   Anil Kumar S. Vullikanti11footnotemark: 1 22footnotemark: 2
Abstract

The largest eigenvalue of the adjacency matrix of a network (referred to as the spectral radius) is an important metric in its own right. Further, for several models of epidemic spread on networks (e.g., the ‘flu-like’ SIS model), it has been shown that an epidemic dies out quickly if the spectral radius of the graph is below a certain threshold that depends on the model parameters. This motivates a strategy to control epidemic spread by reducing the spectral radius of the underlying network.

In this paper, we develop a suite of provable approximation algorithms for reducing the spectral radius by removing the minimum cost set of edges (modeling quarantining) or nodes (modeling vaccinations), with different time and quality tradeoffs. Our main algorithm, GreedyWalk, is based on the idea of hitting closed walks of a given length, and gives an -approximation, where denotes the number of nodes; it also performs much better in practice compared to all prior heuristics proposed for this problem. We further present a novel sparsification method to improve its running time.

In addition, we give a new primal-dual based algorithm with an even better approximation guarantee (), albeit with slower running time. We also give lower bounds on the worst-case performance of some of the popular heuristics. Finally we demonstrate the applicability of our algorithms and the properties of our solutions via extensive experiments on multiple synthetic and real networks.

Approximation Algorithms for Reducing the Spectral Radius To Control Epidemic Spread

Sudip Sahathanks: NDSSL, Virginia Bioinformatics Institute, Virginia Tech. thanks: Department of Computer Science, Virginia Tech.
and Abhijin Adiga11footnotemark: 1
and B. Aditya Prakash22footnotemark: 2
and Anil Kumar S. Vullikanti11footnotemark: 1 22footnotemark: 2


1 Introduction

Given a contact network, which contacts should we remove to contain the spread of a virus? Equivalently, in a computer network, which connections should we cut to prevent the spread of malware? Designing effective and low cost interventions are fundamental challenges in public health and network security. Epidemics are commonly modeled by stochastic diffusion processes, such as the so-called ‘SIS’ (flu-like) and ‘SIR’ (mumps-like) models on networks (more in Section 2). An important result that highlights the impact of the network structure on the dynamics is that epidemics die out “quickly” if , where is the spectral radius (or the largest eigenvalue) of graph , and is a threshold that depends on the disease model [14, 39, 31]. This motivates the following strategy for controlling an epidemic: remove edges (quarantining) or nodes (vaccinating) to reduce the spectral radius below a threshold —we refer to this as the spectral radius minimization (SRM) problem, with variants depending on whether edges are removed (the SRME problem) or whether nodes are removed (the SRMN problem). Van Mieghem et al. [28] and Tong et al. [37] prove that this problem is NP-complete. They also study two heuristics for it, one based on the components of the first eigenvector (EigenScore) and another based on degrees (ProductDegree). However, no rigorous approximations were known for the SRME or the SRMN problems.

Our main contributions.

1. Lower bounds on the worst-case performance of heuristics: We show that the ProductDegree, EigenScore and Pagerank heuristics (defined formally in Section 2) can perform quite poorly in general. We demonstrate graph instances where these heuristics give solutions of cost times the optimal, where is the number of nodes in the graph.

2. Provable approximation algorithms: We present two bicriteria approximation algorithms for the SRME and SRMN problems, with varying approximation quality and running time tradeoffs. Our first algorithm, GreedyWalk, is based on hitting closed walks in . We show this algorithm has an approximation bound of times optimal for the cost of edges removed, while ensuring that the spectral radius becomes at most times the threshold, for arbitrarily small (here denotes the maximum node degree in the graph). We also design a variant, GreedyWalkSparse, that performs careful sparsification of the graph, leading to similar asymptotic guarantees, but better running time, especially when the threshold is small. We then develop algorithm PrimalDual, which improves this approximation bound to an using a more sophisticated primal-dual approach, at the expense of a slightly higher (but polynomial) running time.

3. Extensions: We consider two natural extensions of the SRME problem: (i) non-uniform transmission rates on edges and (ii) node version SRMN. We show that our methods extend to these variations too.

4. Empirical analysis: We conduct an extensive experimental evaluation of GreedyWalk, a simplified version of PrimalDual and different heuristics that have been proposed for epidemic containment on a diverse collection of synthetic and real networks. These heuristics involve picking edges in non-increasing order of some kind of score; the specific heuristics we compare include: (i) ProductDegree, (ii) EigenScore, (iii) LinePagerank, and (iv) Hybrid, which picks the edge based on either the eigenscore or the product-degree ordering, depending on the maximum decrease in eigenvalue. We find that GreedyWalk performs better than all the heuristics in all the networks we study. We analyze GreedyWalk for walks of length ; in practice, we found that the performance degrades significantly as is reduced.

Organization. The background and notation are defined in Section 2. Sections 3, 4 and 5 cover GreedyWalk, GreedyWalkSparse and PrimalDual algorithms, respectively, for the SMRE problem; the SRMN problem is discussed in section 6. Some of the algorithmic details and proofs are omitted for brevity and are available in [35]. Lower bounds for some heuristics and the experimental results are discussed in Sections 7 and 8, respectively. We discuss the related work in Section 9 and conclude in Section 10.

2 Preliminaries

Graph representing a contact network
Total number of nodes in
Degree of node in
Maximum node degree in
Adjacency matrix of
Subgraph of induced on
th largest Eigenvalue of
, spectral radius of
Cost of a vertex or edge of
Infection rate
Recovery rate
Epidemic Threshold,
Time to epidemic extinction
Set of closed walks of length in
number of distinct nodes in walk
Number of closed -walks in containing edge (or vertex)
Optimal solution to SRME
Table 1: Notations

We consider undirected graphs , and interventions to control the spread of epidemics— vaccination (modeled by removal of nodes) and quarantining (modeled by removal of edges). There can be different costs for the removal of nodes and edges (denoted by and , respectively), e.g., depending on their demographics, as estimated by [26]. For a set , denotes the total cost of the set (similarly for node subsets).

There are a number of models for epidemic spread; we focus on the fundamental SIS (Susceptible-Infectious-Susceptible) model, which is defined in the following manner. Nodes are in susceptible (S) or infectious (I) state. Each infected node (in state I) causes each susceptible neighbor (in state S) to become infected at rate . Further, each infected node switches to the susceptible state at rate . In this paper, we assume a uniform rate for all ; in this case, we define a threshold , which characterizes the time to extinction. Let denote the adjacency matrix of , and let . Let denote the th largest eigenvalue of , and let denote the spectral radius of . Since is undirected, it follows that all eigenvalues are real, and (see, e.g., Chapter 3 of [27]). Ganesh et al. [14] showed that the epidemic dies out in time , if in the SIS model, with high probability; this threshold was also observed by [39]. Prakash et al. [31] show this condition holds for a broad class of other epidemic models, including the SIR model (which contains the ‘Recovered’ state). Now we formally define the SRM problem.

Definition 2.1

Spectral Radius Minimization problems (SRME and SRMN): Given an undirected graph , with cost for each edge , and a threshold , the goal of the SRME problem is to find the cheapest subset such that . We refer to the node version of this problem as SRMN.

We discuss some notation that will be used in the rest of the paper. denotes an optimal solution to the SRME problem. Let denote the set of closed walks of length in ; let . For a walk , let denote the number of distinct nodes in . A standard result (see, e.g., Chapter 3 of [27]) is the following:

(2.1)

The number of walks in containing a node is . For a graph , let denote the number of closed -walks in containing . Then, . We say that an edge set hits a walk if contains an edge from . Similarly, for a node , let denote the number of closed -walks in containing . Then, . Table 1 summarizes the frequently used notations.

3 GreedyWalk: -approximation

Main idea. Our starting point is the connection between the number of closed walks in a graph and the sum of powers of the eigenvalues in (2.1). We try to reduce the spectral radius by reducing the number of closed walks of length in the graph, by removing edges (see Algorithm 1). This, in turn, can be viewed as a partial covering problem.111This is a variation of the set cover problem, in which an instance consists of (i) a set of elements, (ii) a collection of sets, (iii) cost for each , and (iv) a parameter . The objective is to find the cheapest collection of sets from which cover at least elements. Slavic [36] shows that a greedy algorithm gives an approximation. Our basic idea extends to other versions, as discussed later in Section 6.

0:  , , , even
0:  Edge set
1:  Initialize
2:  while  do
3:     
4:     Pick that maximizes
5:     
6:  end while
Algorithm 1 GreedyWalk (high level description)

The Lemma below proves the approximation bound for any solution (say ) from GreedyWalk. Let denote the graph resulting after the removal of edges in . Our proof involves three steps: (1) Proving the bound on ; (2) Relating to the cost of the optimum solution to the partial covering problem which ensures that the number of walks in the residual graph is at most ; (3) Showing that the optimum solution to the SRME problem also ensures that at most remain in the residual graph.

Lemma 3.1

Let denote the set of edges found by Algorithm GreedyWalk. Given any constant , let be an even integer larger than . Then, we have , and .

  • We follow the proof scheme mentioned above. By the stopping condition of the algorithm, we have . From (2.1), we have , which implies . Further, since is even (by assumption), , so that . This implies . Since , we have , so that .

    Next, we derive a bound for . Observe that the algorithm can be viewed as solving a partial cover problem, in which (i) the set of elements corresponds to walks in , and (ii) there is a set corresponding to each edge consisting of all the walks in that contain . Following the analysis of the greedy algorithm for partial cover [36], we have , where denotes the optimum solution for this covering instance. Since denotes the maximum node degree, we have . We show below that ; it follows that .

    Finally, we prove that . By definition of , we have . Let . Then, we have

    This implies hits at least walks, so that .

Effect of the walk length . We set the walk length for some constant in Algorithm GreedyWalk; understanding the effect of is a natural question. From the proof of Lemma 3.1, it follows that can be bounded by for any choice of , as long as it is even. This bound becomes worse as becomes smaller, e.g., it is for . This is borne out in the experiments in Section 8.

In order to complete the description of GreedyWalk (Algorithm 1 ), we need to design an efficient method to determine the edge which maximizes the quantity in line 4. We discuss two methods below.

3.1 Matrix multiplication approach for implementing GreedyWalk.

Note that . We use matrix multiplication to compute once for each iteration of the while loop in line 2 of Algorithm 1 . In line 4, we iterate over all edges, in order to compute the edge that maximizes the given ratio. For , can be computed in time , where is the exponent for the running time of the best matrix multiplication algorithm [40]. Therefore, each iteration involves time. This gives a total running time of , since only edges are removed. One drawback with this approach is the high (super-linear) space complexity, even with the best matrix multiplication methods, in general.

3.2 Dynamic programming approach for implementing GreedyWalk.

When the graphs are very sparse ( edges), we adapt a dynamic programming approach to compute for an edge and more efficiently select the edge that maximizes in line 4 of Algorithm 1 . Although, potentially needs to be computed for each edge , in practice it suffices to compute it for only a small subset of . We make use of the fact that for any subgraph . The approach is briefly as follows. Initially we compute for each and arrange the edges in non-ascending order of their value, . After the first edge ( i.e. in the first iteration) is removed, is computed on the residual graph only for some consecutive edges in that order upto some such that . Edges are reordered based on the recomputed walk numbers, and then the same steps are repeated. The approach takes space and time assuming the number of edges is in real world large networks. The detailed algorithm and the analysis is given in the appendix A.1.

4 Using sparsification for faster running time: Algorithm GreedyWalkSparse

The efficiency of Algorithm GreedyWalk can be improved if the number of edges in the graph can be reduced. This can be achieved by two pruning steps - pruning edges such that in the residual graph (i) no node has degree more than , and (ii) there is no -core; the -core of a graph denotes the maximal subgraph of with minimum degree (see, e.g., [3]). We will refer to these steps as MaxDegreeReduction and DensityReduction respectively. This leads to sparser graphs, without affecting the asymptotic approximation guarantees. The algorithm involves two prunning steps: MaxDegreeReduction and DensityReduction; the procedure is described in Algorithm GreedyWalkSparse.

0:  
0:  Edge set
1:  Initialize .
2:  //Pruning step 1: MaxDegreeReduction
3:  Let .
4:  for  do
5:     if  then
6:        Let be the edges incident on ordered so that .
7:        Let .
8:         and .
9:     end if
10:  end for
11:  //pruning step 2: DensityReduction
12:  Let denote the -core of .
13:  Order the edges in non-decreasing order of cost.
14:  
15:  //GreedyWalk on Pruned Graph:
16:  
17:  
18:  
Algorithm 2 Algorithm GreedyWalkSparse
Lemma 4.1

Let and denote the set of edges removed in the pruning steps MaxDegreeReduction and DensityReduction, respectively. Then, and are both at most .

  • Since [27], which implies . Therefore, , where the sum is the minimum cost of edges that can be removed to ensure that the degree of becomes at most . Therefore,

    Recall that the second pruning step is applied on . For bounding , we use another lower bound for : for any induced subgraph of , . Therefore, the existence of a -core implies that . Since the average degree of in the residual graph is at least , it implies that at least edges must be removed from . Therefore,

    where, the correspond to the first edges of least cost. Hence proved.

By Lemma 4.1, it follows that the approximation bounds of Lemma 3.1 still hold. However, the pruning steps reduce the number of edges, thereby speeding the implementation of GreedyWalk. We discuss the empirical performance of pruning in Section 8. We show below that pruning also improves the approximation factor marginally from to which could be significant when is large and .

Lemma 4.2

Let denote the set of edges found by Algorithm GreedyWalkSparse. Given any constant , let be an even integer larger than . Then, we have , and .

  • From Lemma 4.1, the number of edges removed is at most . The residual graph has maximum degree less than . Therefore, applying Lemma 3.1 on , it follows that the number of edges removed is . Hence, the total number of edges removed by GreedyWalkSparse is at most .

5 PrimalDual: -approximation

Main idea: The approach of [13] gives an -approximation for the partial covering problem, where denotes the maximum number of sets that contain any element in the set system. As in the proof of Lemma 3.1, in our reduction from the SRME problem to partial covering, elements correspond to all the closed walks of length , while sets correspond to edges; for an edge , the corresponding set consists of all the walks that are hit by . In this reduction, each walk lies in sets; therefore, for this set system. Therefore, the approach of [13] could improve the approximation factor. Unfortunately, our set system has size , so that the algorithm of [13] cannot be used directly to get a polynomial time algorithm.

0:  Edge set
1:  Initialize for all , .
2:  // for all walks in .
3:  while  is not -feasible do
4:     ; let be an edge for which the minimum is reached.
5:     
6:     For each :
7:        // for all walks in that pass through
8:     
9:  end while
Algorithm 3 PrimalDual

The algorithm of Gandhi et al. [13] uses a primal-dual approach, which maintains dual variables for each element (i.e., walk); these are increased gradually, and a set (i.e., an edge) is picked if the sum of duals corresponding to the elements in the set equals its cost. We now discuss how to adapt this algorithm to run in polynomial time, and only focus on polynomial time implementation of the PrimalDual subroutine of [13] in detail here. However, we also present the set cover algorithm HitWalks for completeness. This algorithm iterates over all edges and invokes PrimalDual in each iteration to obtain a candidate set of edges to remove and finally chooses the set with minimum cost. , , and denote the set of elements (walks) to be covered, the sets (corresponding to edges that can be chosen), the costs corresponding to the sets/edges and the number of elements (walks) that need to be covered, respectively. A subset is -feasible if . Let denote the dual variables corresponding to the walks ; these are not maintained in the algorithm explicitly, but assigned in the comments, for use in the analysis.

0:  Set of all -closed walks , walks corresponding to edges , edge cost set , number of walks to hit
0:  Edge set
1:  Sort the edges of in increasing order of their costs.
2:  Initialize ,
3:  for  to  do
4:      and compute
5:      . //cost of edge set in this iteration
6:     if  then
7:         
8:         
9:     end if
10:     
11:     
12:  end for
Algorithm 4 HitWalks

This algorithm does not explicitly update the dual variables, but the edges are picked in the same sequence as in [13].

Lemma 5.1

Given any constant , let be an even integer larger than . The dual variables in algorithm PrimalDual are maintained and updated as in [13], and the edge picked in each iteration is the same. We have and .

  • Instead of updating the dual variable for each element (walk) , as done in [13], the variable corresponding to each set (edge) is updated in algorithm PrimalDual at the end of each iteration. It is easy to see that, the following is an invariant at the end of each iteration, . Also note that, a set is picked into the cover in PrimalDual, whenever .

    Therefore, increasing the ’s has the same effect as increasing the ’s in terms of picking the sets into the cover and both the algorithm PrimalDual and the one in [13] chooses the same set in each iteration.

6 Node Version

Our discussion so far has focused on the SRME problem. We now consider extensions which capture two kinds of issues arising in practice.

1. Non-uniform transmission rates.

In general, the transmission rate is not constant for all the edges. The transmission rate for edge depends on individual properties, especially the demographics of the end-points and , such as age, e.g., [26]. Let denote the matrix of the transmission rates. This gives us the SRME-nonuniform problem, which is defined as follows: Given an undirected graph , with transmission rate for each and recovery rate , find the smallest set such that . We extend the spectral radius characterization of [14, 39, 31] to handle this setting, and show that GreedyWalk can also be adapted for solving SRME-nonuniform, with the same guarantees. The details of the algorithm, lemma and proofs are discussed in the appendix A.2.

2. The node removal version (Srmn problem).

We extend the GreedyWalk algorithm in a natural manner to work for SRMN, with the same approximation guarantees. For the details, please see the appendix A.3.

7 Popular heuristics and lower bounds

A number of heuristics have been developed for controlling the spread of epidemics– these are discussed below. All these heuristics involve ordering the edges based on some kind of score, and then selecting the top few edges based on this score. We describe the score function in each heuristic.

  1. ProductDegree ([28]): The score for edge is defined as . Edges are removed in non-increasing order of this score.

  2. EigenScore ([28, 37]): Let be the eigenvector corresponding to the first eigenvalue of the graph. The score for edge is .

  3. LinePagerank: This method uses the linegraph of graph , where if have a common endpoint. We define the score of edge as the pagerank of the corresponding node in .

As we find in Section 8.2, these heuristics work well for different kinds of networks. We design another heuristic, Hybrid, which picks the best of the EigenScore and ProductDegree methods. The edges are ordered in the following manner: (1) Let and be orderings of edges in the Eigenscore and ProductDegree algorithms, respectively. (2) Initialize and , and (3) from the edges and , remove the one which decreases the max eigenvalue of the residual graph more. Increment the corresponding index.

We have examined the worst case performance of these heuristics. Two of these, namely, EigenScore and ProductDegree, have been used specifically for reducing the spectral radius, e.g., [28, 37]. No formal analysis is known for any of these heuristics in the context of the SRME or SRMN problems; some of them seem to work pretty well on real world networks. We show that the worst case performance of these heuristics can be quite poor, in general.

Theorem 7.1

Given any sufficiently large positive integer , there exists a threshold , for some constant and a graph of size for which the number of edges removed by ProductDegree, EigenScore, Hybrid and LinePagerank is .

The proof is presented in appendix A.4.

8 Experiments

8.1 Methods and Dataset

We evaluate the algorithms developed in the paper222All code at: http://tinyurl.com/l3lgsq7.GreedyWalk, GreedyWalkSparse and PrimalDual – and compare their performance with the heuristics from literature – EigenScore, ProductDegree, LinePagerank and Hybrid (described in Section 7), as a more sophisticated baseline. The networks which we considered in our empirical analysis are listed in Table 2 spanning infrastructure networks, social networks and random graphs.

Network nodes edges
Barabasi-Albert
Erdos-Renyi
P2P (Gnutella05)
P2P (Gnutella06)
Collab. Net (HepTh)
Collab. Net (GrQc)
AS (Oregon 1)
AS (Oregon 2)
Brightkite Net
Youtube Network
Stanford Web graph
Table 2: Networks and their sizes. The first two are synthetic random networks; others are taken from [2] and [1]

8.2 Experimental results

Performance of our algorithms and comparison with other heuristics: We first compare the quality of solution from our algorithms with the EigenScore, ProductDegree, LinePagerank and Hybrid heuristics in Figure 1. We note that GreedyWalk is consistently better than all other heuristics, especially as the target threshold becomes smaller. Compared to the EigenScore, ProductDegree and LinePagerank heuristics, the spectral radius for the solution produced by GreedyWalk, as a function of the fraction of edges removed, is lower by at least 10-20%. Our improved baseline, the Hybrid heuristic, works better than the other heuristics, and comes somewhat close the GreedyWalk in many networks.

Though PrimalDual gives a significantly better approximation guarantee, compared to GreedyWalk, it has a much higher running time. Therefore, we only evaluate it for one iteration of Algorithm HitWalks. Figure 2 shows that PrimalDual is quite close to GreedyWalk after just one iteration; we expect running this algorithm fully would further improve the performance, but additional work is needed to improve the running time.

(a) AS Oregon-1
(b) AS Oregon-2
(c) Collaboration GrQc
(d) P2P Gnutella-5
(e) P2P Gnutella-6
(f) Brightkite
(g) Portland
(h) Youtube
(i) Stanford Web
Figure 1: Comparison between the GreedyWalk, ProductDegree, Eigenscore, LinePagerank and Hybrid algorithms for different networks. Each plot shows the spectral radius (y-axis) as a function of the fraction of edges removed (x-axis). The LinePagerank heuristic has not been evaluated in (g)g, (h)h and (i)i because of the scale of these networks.
(a) Collaboration GrQc
(b) P2P Gnutella-5
Figure 2: GreedyWalk vs PrimalDual. Each plot shows the spectral radius (y-axis) as a function of the number of edges removed (x-axis) using the two methods.
Figure 3: Total running time of GreedyWalk method (y-axis) as a function of (x-axis), where is the threshold and is the spectral radius of the initial graph, without any edges removed.

Running time and effect of sparsification:. Figure 3 shows the total running time of GreedyWalk for three networks. The time decreases with the increase of , because the while loop in Algorithm GreedyWalk needs to be run for fewer iterations. The high running time motivates faster methods. We evaluate the performance of the GreedyWalkSparse algorithm. As shown in Figure 4, GreedyWalkSparse gives almost the same quality of approximation as GreedyWalk, but improves the running time by up to an order of magnitude, particularly when is small.

(a) #Edges removed
(b) Execution time
Figure 4: Impact of sparsification on GreedyWalk. The plots show for AS Oregon-1 network, (a) the number of edges removed and (b) the execution time on the y-axis, as a function of (x-axis), where is the threshold and is the spectral radius of the initial graph, without any edges removed.

Effect of varying walk lengths: As discussed in Section 3, the walk length parameter is critical for the performance of GreedyWalk. Figure 5 shows the approximation quality in the Oregon-2 and collaboration networks. We find that as becomes smaller, the approximation quality degrades significantly, and the best performance occurs at close to .

(a) AS Oregon-2
(b) Collaboration GrQc
Figure 5: Impact of walk length on GreedyWalk performance. Each plot shows the drop in spectral radius (y-axis) with number of edges removed (x-axis), for different values of , ranging from to , for the corresponding networks.

Extensions: For the SRME-nonuniform problem, we compare the adaptation of GreedyWalk, as discussed in Section 6, with the Eigenscore heuristic run on the matrix of transmission rates. As shown in Figure (b)b, we find that GreedyWalk performs much better. Next we consider the SRMN problem, and compare the GreedyWalk, as adapted in Section 6, with the node versions of the Degree and EigenScore heuristic [37]. As shown in Figure (d)d, GreedyWalk performs consistently better. For results in other networks, see the full version [35].

(a) SRME-nonuniform: Barabasi-Albert
(b) SRME-nonuniform: Collaboration GrQc
(c) SRMN: Oregon-1
(d) SRMN: Collaboration GrQc
Figure 6: Computing solutions for SRME-nonuniform ((a)a,(b)b) and SRMN ((c)c,(d)d) problem on different networks with GreedyWalk algorithm and Degree and EigenScore heuristics as adapted in Section 6. The plots show the resultant spectral radius (y-axis) as fractions of edges/nodes are removed (x-axis) with different methods.

Demographic properties of removed nodes and edges: GreedyWalk can also help in getting non-network surrogates for picking nodes/edges. We analyzed the demographic properties of the nodes and edges removed by GreedyWalk on the Portland contact network [1]. By doing so, we can hope to use such demographic properties directly, for quicker implementation and/or when the entire network is not readily available. Figure 7 shows the age groups of the end points of the top selected edges by GreedyWalk as a matrix. Age-groups are partitioned according to [26] and shown in table 3. As the figure shows, the edges among age-group # (ages ) and with age-groups # (age ) and # (age ) are picked to a greater extent by GreedyWalk. We observe that the edges picked by GreedyWalk have substantially different properties compared to other heuristics . Figure 8 shows the age groups of the nodes removed by the GreedyWalk algorithm for the SRMN problem, along with the age group distribution of the entire population. Observe that more people are selected in age-group numbers 7 to 11 which correspond to ages 25-49.

Age-group Age Age-group Age
1 0 10 40-44
2 1-4 11 45-49
3 5-9 12 50-54
4 10-14 13 55-59
5 15-19 14 60-64
6 20-24 15 65-69
7 25-29 16 70-74
8 30-34 17 75+
9 35-39
Table 3: Age-groups [26]
(a) Removed Edges
(b) ProductDegree
(c) ProductEigenscore
(d) Hybrid
Figure 7: Age-Group matrix of the top 1500 removed edges
(a) Removed Nodes
Figure 8: Age-group of 1500 removed nodes with GreedyWalk from Portland contact graph.
(a) Removed Edges
(b) Removed Nodes
Figure 9: ((a)a) Age-Group matrix of the top 1500 removed edges and ((b)b) Age-group of 1500 removed nodes with GreedyWalk from Portland contact graph.

Main observations:

1. GreedyWalk performs consistently better than existing heurisitics in removing nodes or edges in both static and variable transmission rate settings.

2. Sparsification helps in improving the speed of GreedyWalk without effecting the solution quality.

3. GreedyWalk performs best for walk-lengths of .

4. GreedyWalk can potentially help in picking more accurate non-network surrogates.

9 Related Work

Related work comes from multiple areas: epidemiology, immunization algorithms and other optimization algorithms. There is general research interest in studying dynamic processes on large graphs, (a) blogs and propagations [17, 22], (b) information cascades [15, 16] and (c) marketing and product penetration [34]. These dynamic processes are all closely related to virus propagation.

Epidemiology: A classical text on epidemic models and analysis is by May and Anderson [4]. Most work in epidemiology is focused on homogeneous models [6, 4]. Here we study network based models. Much work has gone into in finding epidemic thresholds (minimum virulence of a virus which results in an epidemic) for a variety of networks [29, 39, 14, 31].

Immunization: There has been much work on finding optimal strategies for vaccine allocation [7, 25, 11]. Cohen et al [12] studied the popular acquaintance immunization policy (pick a random person, and immunize one of its neighbors at random). Using game theory, Aspnes et al. [5] developed inoculation strategies for victims of viruses under random starting points. Kuhlman et al. [21] studied two formulations of the problem of blocking a contagion through edge removals under the model of discrete dynamical systems. As already mentioned Tong et al. [38, 37], Van Miegham et al. [28], Prakash et al. [30] and Chakrabarti et al. [9] proposed various node-based and edge-based immunization algorithms based on minimizing the largest eigenvalue of the graph. Other non-spectral approaches for immunization have been studied by Budak et al [8], He et al [18] and Khalil et al. [20].

Other Optimization Problems: Other diffusion based optimization problems include the influence maximization problem, which was introduced by Domingos and Richardson [33], and formulated by Kempe et. al. [19] as a combinatorial optimization problem. They proved it is NP-Hard and also gave a simple approximation based on the submodularity of expected spread of a set of starting seeds. Other such problems where we wish to select a subset of ‘important’ vertices on graphs, include ‘outbreak detection’ [24] and ‘finding most-likely culprits of epidemics’ [23, 32].

10 Conclusions

We study the problem of reducing the spectral radius of a graph to control the spread of epidemics by removing edges (the SRME problem) or nodes (the SRMN problem). We have developed a suite of algorithms for these problems, which give the first rigorous bounds for these problems. Our main algorithm GreedyWalk performs consistently better than all other heuristics for these problems, in all networks we studied. We also develop variants that improve the running time by sparsification, and improve the approximation guarantee using a primal dual approach. These algorithms exploit the connection between the graph spectrum and closed walks in the graph, and perform better than all other heuristics. Improving the running time of these algorithms is a direction for further research. We expect these techniques could potentially help in optimizing other objectives related to spectral properties, e.g., robustness [10], and in other problems related to the design of interventions to control the spread of epidemics.

Acknowledgments. This work has been partially supported by the following grants: DTRA Grant HDTRA1-11-1-0016, DTRA CNIMS Contract HDTRA1-11-D-0016-0010, NSF Career CNS 0845700, NSF ICES CCF-1216000, NSF NETSE Grant CNS-1011769, DOE DE-SC0003957, National Science Foundation Grant IIS-1353346 and Maryland Procurement Office contract H98230-14-C0127. Also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC000337, the US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.
Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.

References

  • [1] CINET: Cyber infrastructure for network science.
  • [2] SNAP: Stanford network analysis project.
  • [3] A. Adiga and A. Vullikanti. How robust is the core of a network? In Proc. of ECMLPKDD, 2013.
  • [4] R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991.
  • [5] J. Aspnes, K. Chang, and A. Yampolskiy. Inoculation strategies for victims of viruses and the sum-of-squares partition problem. In Proc. of ACM SODA, 2005.
  • [6] N. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London, 1975.
  • [7] L. Briesemeister, P. Lincoln, and P. Porras. Epidemic profiles and defense of scale-free networks. WORM, Oct 2003.
  • [8] C. Budak, D. Agrawal, and A. E. Abbadi. Limiting the spread of misinformation in social networks. In WWW, 2011.
  • [9] D. Chakrabarti, Y. Wang, C. Wang, J. Leskovic, and C. Faloutsos. Epidemic thresholds in real networks. ACM Transactions on Information and System Security (TISSEC), 2008.
  • [10] H. Chan, H. Tong, and L. Akoglu. Make it or break it: Manipulating robustness in large networks. In Proc. of SDM, pages 325–333, 2014.
  • [11] P. Chen, M. David, and D. Kempe. Better vaccination strategies for better people. In In Proc. of ACM conference on Electronic commerce(EC). ACM, 2010.
  • [12] R. Cohen, S. Havlin, and D. ben Avraham. Efficient immunization strategies for computer networks and populations. Physical Review Letters, 91(24), 2003.
  • [13] R. Gandhi, S. Khuller, and A. Srinivasan. Approximation algorithms for partial covering problems. Journal of Algorithms, 53(1):55 – 84, 2004.
  • [14] A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology on the spread of epidemics. In Proc. of INFOCOM, 2005.
  • [15] J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 2001.
  • [16] M. Granovetter. Threshold models of collective behavior. Am. Journal of Sociology, 83(6):1420–1443, 1978.
  • [17] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proc. of WWW, 2004.
  • [18] X. He, G. Song, W. Chen, and Q. Jian. Influence blocking maximization in social networks under the competitive linear threshold model. In SDM, 2012.
  • [19] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In In Proc. of KDD, New York, NY, 2003. ACM Press.
  • [20] E. Khalil, B. Dilkina, and L. Song. Scalable diffusion-aware optimization of network topoloty. In KDD, 2014.
  • [21] C. J. Kuhlman, G. Tuli, S. Swarup, M. V. Marathe, and S. S. Ravi. Blocking simple and complex contagion by edge removal. In Proc. of ICDM, pages 399–408, 2013.
  • [22] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In In Proc. of WWW, 2003.
  • [23] T. Lappas, E. Terzi, D. Gunopulos, and H. Mannila. Finding effectors in social networks. In Proc. of SIGKDD, pages 1059–1068, 2010.
  • [24] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proc. of KDD, 2007.
  • [25] N. Madar, T. Kalisky, R. Cohen, D. ben Avraham, and S. Havlin. Immunization and epidemic dynamics in complex networks. Eur. Phys. J. B, 38(2):269–276, 2004.
  • [26] J. Medlock and A. P. Galvani. Optimizing influenza vaccine distribution. Science, 325(5948), 2009.
  • [27] P. V. Mieghem. Spectral Graph Theory. Cambridge University Press, 2011.
  • [28] P. V. Mieghem, D. Stevanovic, F. F. Kuipers, C. Li, R. van de Bovenkamp, D. Liu, and H. Wang. Decreasing the spectral radius of a graph by link removals. IEEE Transactions on Networking, 2011.
  • [29] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics in finite size scale-free networks. Physical Review E, 65:035108, 2002.
  • [30] B. A. Prakash, L. A. Adamic, T. J. Iwashyna, H. Tong, and C. Faloutsos. Fractional immunization in networks. In SDM, pages 659–667, 2013.
  • [31] B. A. Prakash, D. Chakrabarti, M. Faloutsos, N. Valler, and C. Faloutsos. Threshold conditions for arbitrary cascade models on arbitrary networks. Knowledge and Information Systems, 2012.
  • [32] B. A. Prakash, J. Vreeken, and C. Faloutsos. Spotting culprits in epidemics: How many and which ones? In ICDM, 2012.
  • [33] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In Proc. of KDD, 2002.
  • [34] E. M. Rogers. Diffusion of Innovations, 5th Edition. Free Press, August 2003.
  • [35] S. Saha, A. Adiga, B. A. Prakash, and A. Vullikanti. Reducing the spectral radius to control epidemic spread. Technical report, available at http://staff.vbi.vt.edu/ssaha/papers/eigext.pdf.
  • [36] P. Slavik. Improved performance of the greedy algorithm for partial cover. Information Processing Letters, 1997.
  • [37] H. Tong, B. A. Prakash, T. Eliassi-Rad, M. Faloutsos, and C. Faloutsos. Gelling, and melting, large graphs by edge manipulation. In CIKM, 2012.
  • [38] H. Tong, B. A. Prakash, C. E. Tsourakakis, T. Eliassi-Rad, C. Faloutsos, and D. H. Chau. On the vulnerability of large graphs. In ICDM, 2010.
  • [39] Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos. Epidemic spreading in real networks: An eigenvalue viewpoint. In Symposium on Reliable Distributed Systems, pages 25–34, Los Alamitos, CA, 2003. IEEE Computer Society Press.
  • [40] V. V. Williams. Multiplying matrices faster than coppersmith winograd. In Proc. of STOC, 2012.

Appendix A Appendix

a.1 GreedyWalk with Dynamic Programming Approach

Main idea: we adapt a dynamic programming approach in sparse graphs to avoid matrix multiplication, that leads to lower space complexity, thereby allowing us to scale to larger graphs. We then observe that the number of walks does not need to be recomputed each time an edge is deleted.

Let denote the number of walks of length from node through edge as the first edge to node in . It is easy to see that, . Algorithm ClosedWalkDP describes how to compute . In the algorithm, denotes the neighbors of node in .

0:  
0:  Number of closed walks of length in containing
1:  Let , ,
2:  for  to  do
3:     ,
4:  end for
5:  return
Algorithm 5

Next, we describe in Algorithm GreedyEdgeChoice how the greedy edge choice in line 4 of Algorithm GreedyWalk is implemented efficiently. We make use of the fact that for any . In every iteration of Algorithm GreedyEdgeChoice, potentially, we need to update for all edges in . However, in practice, we observe that the number of such updates is very small compared to .

0:  
0:  Edge set
1:  Initialize and , let //computed by ClosedWalkDP
2:  while  do
3:     Order edges of in the decreasing order of values. Let be the first edge.
4:     
5:     for  do
6:        Update .
7:        if  then
8:           Exit from the for loop
9:        end if
10:     end for
11:  end while
Algorithm 6 GreedyEdgeChoice

Running time and space complexity: Let , . Note that, takes time to compute . Therefore, computing for all the edges takes , assuming in real world networks. Since, for computing , , needs to look only at , , therefore, the space complexity is .

a.2 Non-uniform transmission rates

Let denote the matrix of the transmission rates. We assume the rates are symmetric, i.e., . In this case, the sufficient condition for the epidemic to die out is slightly different, and is stated below.

Lemma A.1

Let be the matrix of transmission rates, and let be the recovery rate in the SIS model. If , the time to extinction, satisfies

For the case of uniform costs, i.e., for all edges , this motivates the following problem:

Definition A.1

SRME-nonuniform problem Given an undirected graph , with transmission rate for each and recovery rate , find the smallest set such that .

In this section, we use to denote the optimum solution to SRME-nonuniform. Our algorithm GreedyWalk-nonuniform adapts GreedyWalk to a weighted covering problem. We need to refine the definitions used earlier. For walk , let denote its weight, where is the number of occurrences of edge in walk ; for a set of walks, let denote the total weight of . In the algorithm, we will need to compute , which is done by modifying the recurrence used in Algorithm CountWalks to compute :

Let denote the total weight of walks containing edge ; . Algorithm GreedyWalk-nonuniform involves the following steps:

  • while :

    • Pick the that maximizes .

Lemma A.2

Let denote the set of edges found by Algorithm GreedyWalk-nonuniform. Given any constant , let be an even integer greater than , we have and .

  • The bound on follows on the same lines as the proof of Lemma 3.1. The main difference is that the proof of [36] does not consider the case of weights associated with elements. But, as we argue now, the same approach for analyzing greedy algorithms extends to our case, and we show .

    We partition the iterations of Algorithm GreedyWalk-nonuniform into phases. Each phase, ends at the first iteration when the total weight that needs to be further covered goes down by a factor of at least . So if is the weight that needs to be covered at the start of the phase, in every iteration of the phase, there exists an edge (which is in an optimum solution) such that . Thus, the total cost of the edges selected in the phase is at most . Since the ratio of over the minimum weight of a walk is polynomial in , the total number of phases is . Adding over all phases then yields the desired bound on . Putting this together with the rest of the proof of Lemma 3.1 yields the desired bound.

a.3 Node version: Srmn problem

Recall the definition of from Section 2. Let denote the subgraph of induced by subset . We modify Algorithm GreedyWalk to work for the SRMN problem in the following manner:

1:  Initialize
2:  while  do
3:     
4:     Pick that maximizes
5:     
6:  end while
Algorithm 7 Algorithm GreedyWalkSRMN

It can be shown on the same lines as Lemma 3.1 that this gives a solution of cost , where denotes the cost of the optimal solution to SRMN problem. Further, the same running time bounds as in Sections 3.1 and 3.2 hold.

a.4 Proof of Theorem 7.1

Construction: We construct a graph for which the statement holds. For convenience let us assume that is a positive integer. contains (1) a clique on nodes; (2) a caterpillar tree , which comprises of a path with adjacent to leaves each and (3) , a star graph with leaves and central vertex denoted by . We connect to by where, is some node in and is connected to by the edge . Note that and . Again, here we assume that is an integer.

Bound on : We will show that . Removing the edges and isolates the components ,