Efficient sampling of spreading processes on complex networks using a composition and rejection algorithm

Efficient sampling of spreading processes on complex networks using a composition and rejection algorithm

Guillaume St-Onge guillaume.st-onge.4@ulaval.ca Jean-Gabriel Young Laurent Hébert-Dufresne Louis J. Dubé Département de Physique, de Génie Physique, et d’Optique, Université Laval, Québec (Québec), Canada, G1V 0A6 Centre interdisciplinaire de modélisation mathématique de l'Université Laval, Québec (Québec), Canada, G1V 0A6 Department of Computer Science and Vermont Complex Systems Center, University of Vermont, Burlington, VT 05401, USA
Abstract

Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require or even operations per update—where is the number of nodes—display instead a polynomial scaling for networks that are either dense or sparse and heterogeneous. This significantly affects the required computation time for simulations on large networks. To circumvent the issue, we propose a node-based method combined with a composition and rejection algorithm, a sampling scheme that has an average-case complexity of per update for general networks. This systematic approach is first set-up for Markovian dynamics, but can also be adapted to a number of non-Markovian processes and can enhance considerably the study of a wide range of dynamics on networks.

keywords:
Spreading processes, Complex networks, Stochastic simulation algorithms.
\biboptions

numbers,sort&compress

1 Introduction

Stochastic processes in a discrete state space are useful models for many natural and human-related complex systems. Considering that complex and heterogeneous connectivity patterns form the backbone of these systems, the study of dynamical processes on networks has grown in popularity in the last decades. Spreading processes are a prime example Pastor-Satorras et al. (2015); Wang et al. (2017); Kiss, Miller, and Simon (2017). They are used to model a wide range of phenomena related to propagation among a population. While disease epidemics Anderson and May (1991) is the most commonly studied process, other important examples include social contagions Morgan et al. (2017); Lehmann and Ahn (2018) and even beneficial epidemics Berdahl et al. (2016).

Many analytical approaches have been developed to study the outcome of spreading processes on complex networks, using mean field Boguñá and Pastor-Satorras (2002); Moreno, Pastor-Satorras, and Vespignani (2002); Van Mieghem, Omic, and Kooij (2009); Barrat, Barthelemy, and Vespignani (2008), moment closure Eames and Keeling (2002); Mata, Ferreira, and Ferreira (2014); Sharkey et al. (2015); St-Onge et al. (2018), percolation mapping Newman (2002); Kenah and Robins (2007); Parshani, Carmi, and Havlin (2010) and message passing techniques Karrer and Newman (2010); Shrestha, Scarpino, and Moore (2015) to name a few (see Refs. Pastor-Satorras et al. (2015); Wang et al. (2017); Kiss, Miller, and Simon (2017) for recent overviews). However, most results hold only for random networks and tree-like structures, or stand as approximations for general networks. This undeniably contributes to our understanding of real systems, but any conclusions drawn from these approaches need to be supported by exact, robust, numerical simulations.

However, an important limitation of numerical simulations is certainly their computational cost, which generally increases with the size of the system. Spreading processes are a class of nonequilibrium models that exhibit a phase transition in the thermodynamic limit (infinite size systems). This particular feature has motivated the need for simulation algorithms that can handle network with very large number of nodes . This is especially true for the study of anomalous phenomena, such as Griffith phases and smeared phase transitions Muñoz et al. (2010); Ódor (2014); Mata and Ferreira (2015); Cota, Ferreira, and Ódor (2016); Cota, Ódor, and Ferreira (2018); St-Onge et al. (2018), typically observed in large-size networks. Therefore, the development of stochastic simulation algorithms that can efficiently tackle large networks of varied structures is unescapable, although it offers a considerable, albeit fascinating, computational challenge. It is also critical to the foundations of the field of spreading processes, one of the cornerstones of network science.

One standard formulation of spreading on networks is in terms of a continuous-time stochastic process. A widely used numerical approach involves a temporal discretization using a finite time step. Despite its simplicity, this is both computationally inefficient—requiring operations per time step—and prone to discretization errors Fennell, Melnik, and Gleeson (2016). Instead, the state-of-the-art methods are based on the Doob-Gillespie’s algorithm Gillespie (1976), which produces statistically exact sequences of states. Many studies have been dedicated to the improvement of its efficiency Gibson and Bruck (2000); Slepoy, Thompson, and Plimpton (2008); Goutsias and Jenkinson (2013); Yates and Klingbeil (2013), and from these ideas have emerged some implementations for spreading processes Vestergaard and Génois (2015); Kiss, Miller, and Simon (2017); Cota and Ferreira (2017); Masuda and Rocha (2018); de Arruda, Rodrigues, and Moreno (2018). However, these approaches can still be inefficient for certain structures, for instance dense or sparse and heterogeneous networks.

In this work, we propose an efficient method for the simulation of Markovian spreading processes that builds on the Doob-Gillespie’s algorithm. It hinges on a composition and rejection scheme Slepoy, Thompson, and Plimpton (2008) that requires only operations per update for general networks. We demonstrate that it provides a formidable improvement compared to other algorithms whose computation time can scale polynomially with the network size. An implementation of the algorithm is available online St-Onge (2018).

2 Markovian spreading processes on networks

For the sake of simplicity, we consider simple graphs of nodes and edges, although the methods discussed can all be applied to directed, weighted, and multi-layer networks. We consider spreading processes defined as continuous Markov processes with discrete states . We can focus on canonical epidemiological models to understand the methods—other compartmental models can also be accomodated, for instance by adding accessible states for the nodes, but numerical approaches are similar. Let us therefore denote the state of each node as , susceptible, infected or recovered respectively. If infected node and susceptible node are connected, node transmits the disease to node at rate . Node spontaneously recovers at rate , and becomes recovered if it develops immunity against the spreading disease, or susceptible otherwise. From this framework, we distinguish two models with different properties in the limit .

SIS model

Infected nodes become directly susceptible after recovery (they do not develop immunity). For finite size networks, the absorbing state with all nodes susceptible, where the system remains trapped, is eventually visited. However, for infinite size networks111In practice, this is observed for large (finite size) networks., there exists some threshold for fixed such that the system reaches an endemic state, where a stationary density of the nodes remains infected on average, for all [see Fig. 1(a)]. This stationary density of infected nodes is hereafter referred as the prevalence.

SIR model

Recovered nodes develop immunity. After a certain time, all nodes are either susceptible or recovered. Similarly to the SIS model, there exists a threshold value above which even an infinitesimal inital density of infected nodes will ultimately lead to a macroscopic number of recovered nodes that scales with network size [see Fig. 1(b)]. The final density of recovered nodes is hereafter referred as the final size.

Threshold estimates are provided in A for uncorrelated random networks with an arbitrary degree distribution.

Figure 1: Typical phase transition of the order parameters for the two canonical models of spreading. The recovery rate is fixed to while the transmission rate is varied. Simulations were performed using the composition and rejection algorithm (see Sec. 3.2.2) on a single realisation of the random graph ensemble Erdös and Rényi (1959), with and . (a) Average prevalence in the endemic state for the SIS model. (b) Average final size for an outbreak of the SIR model, starting with an initial infected node density of . In both cases, the error bars associated to the standard deviation are hidden by the markers.

3 Exact simulation algorithms

Spreading processes can be decomposed into a set of independent processes for each state of the system. A process can be the transmission of a disease from node to or the recovery of an infected node. We can distinguish the set of transmission processes and the set of recovery processes such that and . To establish a coherent framework for the stochastic simulation algorithms, we consider that the execution of a process is an event that may change the state of the system, but it is not required to do so. For instance, the transmission of a disease from node to is a process that implies if . However, if prior to the transmission, then the state of the system stays the same. Considering or not the transmission of a disease between two infected nodes as an ongoing process is only an algorithmic decision; from our definition, an infected node transmits the disease to all of its neighbors irrespectively of their states.

Since every possible transmission and recovery for a state is an independent process with rate , the total rate of events is defined as

(1)

The inter-event time before the execution of any process is exponentially distributed

(2)

and the executed process is chosen proportionally to its propensity, i.e. its rate

(3)

In this section, we introduce and discuss different numerical methods to sample sequences of states based on the previous equations. To compare the methods and give insights on their behavior for large system, we will provide expressions for the theoretical number of operations required on average to perform a state transition, i.e. with . The expressions are upper bounds to the number of operations required on average and are expressed using the big notation.

3.1 Doob-Gillespie’s algorithm (direct method)

The direct method consists of the determination of the next process and the time at which it will be executed. It can be summarized by these steps :

  1. Determine the inter-event time using Eq. (2).

  2. Determine the executed process using Eq. (3).

  3. Update the state according to the chosen process (if ), and update the time .

  4. Update, create, or remove processes resulting from the execution of process .

  5. Update and , then return to step 1.

There have been various implementations of this direct procedure that have been discussed for spreading processes on static networks Fennell, Melnik, and Gleeson (2016); Cota and Ferreira (2017); Kiss, Miller, and Simon (2017); Masuda and Rocha (2018).

For general networks, the number of ongoing processes is . Efficient implementations typically use a binary-tree data structure to store the processes and keep updated Gibson and Bruck (2000); Masuda and Rocha (2018). This allows the insertion and the retrieval of processes in operations. The costly processes involve the infection of a degree node. To perform this update, the required number of operations is

(4)

and is associated with the storage of future transmission processes.

A fact often overlooked is that the degree of a newly infected node can be large on average. For dense networks, it scales as ; for sparse and heterogeneous networks near the phase transition, on average with (see A). This is problematic, since phase transitions are central to many studies.

3.2 Node-based method

Another type of implementation has been suggested Cota and Ferreira (2017); de Arruda, Rodrigues, and Moreno (2018). To sample among the infection processes, one selects a node among the infected nodes, proportionally to its degree , then selects one of its neighbors randomly and infects it if . The sampling of recovery processes is simply done by selecting an infected node uniformly. Using this scheme, one does not have to look through all the neighbors of an infected node to check for new processes , associated with the operations in the direct method.

We introduce what we call the node-based method in the spirit of this scheme. The idea is to regroup the propensity of all processes associated with a node into a single propensity in order to prevent their enumeration. The probability of selecting a process performed by node is then factorized as . The probability of selecting a node is written

(5)

where

(6)

is the total propensity for node to be involved in a transmission or recovery event, is the Kronecker delta and is the degree of node . The probability of selecting a process performed by node is formally written

(7)

In practice, we use the probability that the chosen process is a transmission

(8)

to select the type of process. If it is a transmission, we select a random neighbor and infect it; if it is a recovery, the node becomes susceptible or recovered depending on the spreading model used. Note that the total rate of events [Eq. (1)] has an equivalent definition in terms of the propensities of the nodes, namely

(9)

An important property of this scheme in the context of spreading processes is that Eq. (6) is only dependent on the state of the node . Hence, when the state of a node changes due to the execution of a process, we do not need to update the propensity of neighboring nodes , but only . We discuss two implementations for this method.

3.2.1 Rejection sampling

Figure 2: Rejection area (gray) compared to the acceptance area (blue), for the selection of a node with the node-based method. (a) Rejection sampling. (b) Composition and rejection sampling.

A simple approach to select a node according to is to use rejection sampling. First, one needs to determine the maximal propensity possible for a node, in this case being

(10)

where is the maximal degree of a node in the network. Second, one needs to keep updating an array of pairs of elements , the pairs being node labels and their propensity —we keep only pairs with non-zero propensities . Finally, one draws a pair at random from and accepts it with probability

(11)

The complete procedure for the execution of a process using the node-based method with rejection sampling is presented in Algorithm 1. Although the SIS model was considered, one only needs to change step 19 by for it to be compliant with the SIR model.

Input: , ,
Output: and updated input

1:choose uniformly at random
2:
3:select uniformly at random
4:choose uniformly at random
5:if  then
6:     choose node
7:else
8:     go back to step 3
9:end if
10:choose uniformly at random
11:if  then transmission process
12:     choose uniformly at random
13:     if  then
14:         
15:         
16:         
17:     end if
18:else recovery process
19:     
20:     
21:     
22:end if
Algorithm 1 Node-based method with rejection sampling for an update of the SIS model

It is straightforward to verify that most steps are in Algorithm 1, except possibly the rejection sampling portion (steps 3 to 9). The average acceptance probability for a state is , where

(12)

is the average propensity for infected nodes in state and stands for the average degree of infected nodes in state . Since the number of required operations follows a geometric distribution, the average number of operations for an update from a state is

(13)

For networks with a homogeneous degree distribution, we have for all states, leading to a small number of operations. However, this rejection sampling scheme is vulnerable to cases where the propensity ranges over multiple scales—more specifically, when for typical states . This happens when is large, which is common for heterogeneous degree distribution, such as (see A). To circumvent this, we must update and sample efficiently from a heterogeneous distribution of propensities [Eq. (5)].

3.2.2 Composition and rejection for multiscale propensity

The high rejection probability of the rejection sampling for heterogeneous propensities is illustrated by the gray portion in Fig. 2(a). The problem is the uniform proposal distribution on (step 3 of Algorithm 1), which is a poor choice for a heterogenous distribution . To improve upon rejection sampling, we need an algorithm that systematically constructs a proposal distribution that upper bounds the rejection probability, as illustrated in Fig. 2(b).

We propose to use a method of composition and rejection, similarly to Ref. Slepoy, Thompson, and Plimpton (2008) for biochemical reaction networks, inspired by Ref. Devroye (1986). It is also a direct improvement over the 2-group method proposed in Ref. Cota and Ferreira (2017). The idea is to create a partition of nodes with similar propensities , with . Once a node gets infected (or more generally gets a propensity ), it is assigned to a group —in our implementation, the pair is stored in an array . The probability to select a node is then factorized as .

The probability of selecting a group is

(14)

where is the propensity associated to the group of nodes. In practice, to select an array proportionally to , we implement a binary decision tree as illustrated in Fig. 3, where each leaf points to an array . Starting from the root, it takes operations to choose one of the leaves. Once a process is chosen and executed, we update the array , the propensity and recursively the parent values in the tree—one notes that the root value is in fact . Again, operations are needed for this task.

The probability of selecting a node within a group is

(15)

If the partition is wisely chosen, nodes are selected efficiently using rejection sampling, replacing in Eq. (11) by a group specific maximal propensity .

A systematic approach to construct the partition is to impose a minimum acceptance probability, say one half. This leads to an average number of operations upper-bounded by 2. Let us define the minimal propensity as

(16)

where is the minimal degree in the network. We impose that the -th group allows nodes with propensity , except for the last group allowing . The group specific maximal propensity is thus

(17)

and the number of group required is

(18)

The complete procedure for the execution of a process using the node-based method with composition and rejection sampling is presented in Algorithm 2.

Input: ,
Output: and updated input

1:get from the root of
2:choose uniformly at random
3:
4:choose uniformly at random
5:select from using
6:select uniformly at random
7:choose uniformly at random
8:if  then
9:     choose node
10:else
11:     go back to step 6
12:end if
13:choose uniformly at random
14:if  then transmission process
15:     choose uniformly at random
16:     if  then
17:         
18:         
19:         update
20:     end if
21:else recovery process
22:     
23:     
24:     update
25:end if
Algorithm 2 Node-based method with composition and rejection sampling for an update of the SIS model
Figure 3: Decision tree used to select the arrays .

For general networks with , the ratio of extreme propensities . Therefore, the average-case complexity per update is

(19)

For networks with a maximal degree independent of , the complexity is .

It is worth mentioning that our choice to impose a minimum acceptance probability of one half is not unique : one could impose an acceptance probability of with and obtain the same average-case complexity. For , it increases the acceptance probability, but it also increases the number of groups required . Therefore, there is a trade-off to consider when one tries to minimize the required number of operations. We made several trials with , but it has never resulted in noticeable improvements for the computation time.

3.3 Event-driven method

Another type of approach for the simulation of spreading processes has been considered lately Kiss, Miller, and Simon (2017); de Arruda, Rodrigues, and Moreno (2018). The philosophy is based on the next reaction method Gibson and Bruck (2000), originally proposed for the simulation of chemical reaction networks to improve upon the original Doob-Gillespie’s algorithm. The principal concept of this scheme is to draw an inter-event time for each of the specific process , and execute the latter at time , where is the absolute time when the process was created. Therefore, one focus on the execution time of each process independently, instead of inferring the global inter-event time and the first process to be executed among , as in standard Gillespie method.

For Markovian dynamics, the inter-event time before the execution of a process is exponentially distributed

(20)

However, it is important to stress that this approach can also be applied to non-Markovian spreading processes Kiss, Miller, and Simon (2017); de Arruda, Rodrigues, and Moreno (2018).

To store and retrieve the processes efficiently, one can use a priority queue, where the highest priority element corresponds to the process with the lowest absolute execution time . Recovery processes are eventually executed and can be stored directly in the priority queue. Transmission processes are stored if the inter-event time for the transmission is smaller than the inter-event time for the recovery of the infected node. Depending on the implementation, one can also verify a priori if a neighbor node will already be infected or recovered, and prevent the storage of the transmission process.

To compare this approach with our node-based schemes, we used the implementation provided by Ref. Kiss, Miller, and Simon (2017), called the event-driven method, for which a detailed pseudocode is available for the SIS and SIR models. The interface is modular and general enough to be used for any networks, a requirement for our benchmark calculations.

As for the standard Gillespie algorithm, a costly process involves the infection of a degree node, for which

(21)

operations are required. While this quantity is probably not the most representative for the average computation time associated with the algorithm [see Fig. 4], we can clearly identify the different impact of the term for dense or sparse and heterogeneous networks.

4 Efficiency of the stochastic simulation algorithms

Figure 4: Average computation time for a single update of the state for spreading processes using different algorithms. Each marker is averaged over 10 spreading sequences and 10 realizations of a random graph ensemble. The computation time of the event-driven method was rescaled to match the first marker of the composition and rejection method. (Upper row) Average over sequences of state transitions for the SIS model in the stationary state. The systems had been thermalized using transitions beforehand. (Lower row) Average over complete sequences of the SIR model, starting with an initial infected node density of . (a) and (d) random graphs with fixed number of nodes and different average degree . The recovery and transmission rates used are and . (b) and (e) Random graphs with an expected degree sequence Chung and Lu (2002); Miller and Hagberg (2011). We used a power law expected degree distribution with , and . The recovery and transmission rates used are and . (c) and (f) Same as (b) and (e), but with .

We have described different schemes for the simulation of spreading processes, with different associated complexity for the update of the state . In this section, we show and discuss the impact on the computation time using synthetic networks. In Fig. 4 we compile the results of our benchmark calculations, using the SIS model in the stationary state and complete sequences of the SIR model, starting with a small infected node density. We fix the recovery rate without loss of generality. Furthermore, we scale the transmission rate parameter according to the underlying threshold of the dynamics. The intent is to always have a similar prevalence (SIS) or final size (SIR) in our simulations as we tune the network structure : as we discuss in B, different values for the order parameter can affect the expected number of required operations.

Implementations for the two node-based methods—hereafter referred as to rejection and composition and rejection—are in C++, while the implementation of the event-driven method is in Python Kiss, Miller, and Simon (2017); Miller (2018). This discrepancy of programming languages causes a multiplicative factor overhead of roughly 100 for the computation times of the event-driven method. To provide a fairer comparison, and since we are mostly interested in the scaling of the algorithms with the size of the network, we have rescaled the computation times obtained with the event-driven method to match the first marker of the composition and rejection method in each panel of Fig. 4.

4.1 Computation time for homogeneous networks

As a model of homogeneous networks, we used the random graphs ensemble Erdös and Rényi (1959). In the limit , the degree distribution is a binomial with and the degree of nodes are well represented by the mean value .

We observe in Fig. 4(a) that the average computation time for the event-driven method roughly scales linearly with the average degree, in agreement with Eq. (21). For the SIR model in Fig. 4(d), the dependence is less important, except for very large average degree.

As a comparison, both node-based methods are completely independent of the average degree [Fig. 4(a) and 4(d)]. This is in line with the two steps selection procedure for infection processes. A node-based scheme should therefore be privileged whenever one wants to sample networks with large average degree, such as dense networks where .

One also observes that the rejection method is slightly more efficient [Fig. 4(a)] than the composition and rejection method : this is due to the simpler implementation of the rejection sampling, without the need for a composition step. As discussed in Sec. 3.2.2, the average number of operations is only problematic when propensities span multiple scales ; for homogeneous networks, the ratio is expected to be for all states.

4.2 Computation time for heterogeneous networks

As a model of heterogeneous networks, we used random graphs with an expected degree sequence Chung and Lu (2002); Miller and Hagberg (2011), also called Chung-Lu graphs, where is the expected degree of node . We used sequences drawn from a power-law distribution to generate heterogeneous networks, with in Fig. 4(b) and Fig. 4(e), and in Fig. 4(c) and Fig. 4(f).

We observe in Fig. 4(b) that the computation time for the event-driven method scales polynomially with the number of nodes. In Fig. 4(c), the computation time slightly increases, but with a much smaller exponent. This is explained by Eq. (21), with

(22)

near the phase transition (see A). For the SIR model in Fig. 4(e) and 4(f), the computation times are less influenced by the size of the network. In this case, the number of required operations predicted in Eq. (21) by the most costly processes overestimates the average computation time (as also noted for homogeneous networks).

We observe that the rejection method scales polynomially with the number of nodes as well, but this time the scaling exponent is larger for moderately heterogeneous networks [Fig. 4(c) and 4(f)] than for very heterogeneous networks [Fig. 4(b) and 4(e)]222A larger dispersion for the degree distribution implies a more heterogeneous network.. This is roughly explained by Eq. (13), with (see A) and , leading to

(23)

Finally, we see that the computation time for the composition and rejection method is, for all practical purposes, independent of the number of nodes.

5 Conclusion

We have introduced a stochastic simulation algorithm for the simulation of spreading processes on networks, combining a node-based perspective with the efficiency of composition and rejection sampling St-Onge (2018). This algorithm requires operations per update, making it superior (or at worst, equivalent) to the other state-of-the-art algorithms. It is particularly well suited for the sampling of large and heterogeneous networks, since its average computation time is, for all practical purposes, independent of the network size or the density of edges.

Note that there is a complete branch of literature whose concern is the efficient sampling of the quasistationary distribution of states for processes with an absorbing state, such as the SIS model (see Refs. de Oliveira and Dickman (2005); Sander, Costa, and Ferreira (2016); Cota and Ferreira (2017); Macedo-Filho et al. (2018) for instance). Indeed, finite size analysis of the critical phenomenon requires the sampling of sequences that do not fall on the absorbing state. In this paper, we have focused on the stochastic simulation algorithms that provide statistically exact sequences of states, which is also fundamental to sample the quasistationary distribution. Therefore, our work contributes indirectly to this line of study.

Despite the fact that we have considered explicitly Markovian spreading processes, our composition and rejection scheme can be directly applied to certain classes of non-Markovian processes with completely monotone survival function, using the Laplace transform, in the spirit of Ref. Masuda and Rocha (2018). It can also directly be used with a variety of spreading processes on time-varying networks, where the structure evolves independently from the dynamics St-Onge et al. (2018); Taylor, Taylor, and Kiss (2012), or co-evolves with it Gross, D’Lima, and Blasius (2006); Scarpino, Allard, and Hébert-Dufresne (2016). Extension of this method for complex contagion Lehmann and Ahn (2018) would further be an interesting avenue.

Finally, from a more general perspective, we can argue that the idea behind composition and rejection is a systematic and efficient regrouping of independent processes, especially suited for multiscale propensities, as discussed in Sec. 3.2. It would be simple to exploit this scheme for many of the stochastic processes studied in network science, such as multistate dynamical processes Gleeson (2013); Fennell and Gleeson (2017) or network generation models Krapivsky, Redner, and Leyvraz (2000); Hébert-Dufresne et al. (2011).

Acknowledgments

We acknowledge Calcul Québec for computing facilities. This research was undertaken thanks to the financial support from the Natural Sciences and Engineering Research Council of Canada (NSERC), the Fonds de recherche du Québec — Nature et technologies (FRQNT) and the Sentinel North program, financed by the Canada First Research Excellence Fund.

Appendix A Some results for uncorrelated random networks

Over the last two decades, many analytical results have been obtained for spreading processes on uncorrelated random networks with an arbitrary degree distribution , i.e. the probability of having an edge with endpoint nodes of degree is . To generate such uncorrelated networks, one must respect the structural cut-off Catanzaro, Boguñá, and Pastor-Satorras (2005), i.e. .

In the infinite size limit, threshold estimates are obtained for spreading processes. For the SIS model, a good approximation is St-Onge et al. (2018)

(24)

while for the SIR model, the exact result is Volz (2008); Miller and Hagberg (2011)

(25)

Equations (24) and (25) were used in the simulations of Fig. 4.

Using the heterogeneous mean-field theory Boguñá and Pastor-Satorras (2002); Moreno, Pastor-Satorras, and Vespignani (2002), we can also approximate the degree distribution of infected nodes near the phase transition. On the one hand, for the SIS model in the stationary state, we have

(26)

with degree distribution , and where is the average fraction of infected neighbors for a susceptible node. Near the phase transition, , which means that the average degree of infected nodes is

(27)

On the other hand, if we consider a complete sequence of the SIR model, the average degree of recovered nodes is a good proxy, which is also near the phase transition. For power-law degree distribution , maximal degree and , the second moment of the degree distribution scales with the number of nodes as

(28)

Appendix B Overhead factor due to phantom events

An element that we have discarded in our complexity analysis is the average fraction of phantom processes Cota and Ferreira (2017) , i.e. processes that do not change the state . One example is the transmission to an already infected node. It is assumed in our analysis of Sec. 3 that is upper-bounded by a constant , independent of the number of nodes in the system—this is well supported by the results of the composition and rejection scheme in Fig. 4. It is worth pointing that the results of Fig. 4 are unbiased, since we counted the number of real transitions with to evaluate the empirical average computation time.

For different values of prevalence (SIS) or final size (SIR), phantom processes can lead to a certain overhead. We show in Fig. 5 the expected multiplicative factor on the number of operations required for an update of the state, due to a certain fraction of phantom processes.

Near the phase transition, the overhead factor is negligible, but it can get important for prevalence or final size near 1 where infected nodes are mostly surrounded by infected or recovered nodes. It would be possible to reduce the fraction of phantom processes in the node-based methods for the SIR model : we could count the number of susceptible neighbors for a newly infected node, and modify the propensity to

(29)

However, the update involving the infection of a degree node would now require

(30)

which could be worst in certain cases according to our study. Since —or for upper-bounded maximal degree—is a small number of operations, it is probably safer (and simpler) to keep the schemes introduced in Sec. 3.2.

Figure 5: Expected overhead factor for the update time of a state due to phantom events, using node-based methods. This factor is estimated by and is measured for (a) the SIS model in the stationary state and (b) complete sequence of the SIR model. We sampled over 10 realizations of random graphs ensembles : the ensemble with and ; random graphs with an expected degree sequence, a power law expected degree distribution with , a natural cut-off and .

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
267845
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description