Generalized threshold-based epidemics in random graphs: the power of extreme values

Generalized threshold-based epidemics in random graphs: the power of extreme values

Abstract

Bootstrap percolation is a well-known activation process in a graph, in which a node becomes active when it has at least active neighbors. Such process, originally studied on regular structures, has been recently investigated also in the context of random graphs, where it can serve as a simple model for a wide variety of cascades, such as the spreading of ideas, trends, viral contents, etc. over large social networks. In particular, it has been shown that in the final active set can exhibit a phase transition for a sub-linear number of seeds. In this paper, we propose a unique framework to study similar sub-linear phase transitions for a much broader class of graph models and epidemic processes. Specifically, we consider i) a generalized version of bootstrap percolation in with random activation thresholds and random node-to-node influences; ii) different random graph models, including graphs with given degree sequence and graphs with community structure (block model). The common thread of our work is to show the surprising sensitivity of the critical seed set size to extreme values of distributions, which makes some systems dramatically vulnerable to large-scale outbreaks. We validate our results running simulation on both synthetic and real graphs.

Generalized threshold-based epidemics in random graphs: the power of extreme values


Michele Garetto
University of Torino

michele.garetto@unito.it
Emilio Leonardi
Politecnico di Torino

leonardi@polito.it
Giovanni-Luca Torrisi
IAC-CNR

torrisi@iac.rm.cnr.it


\@float

copyrightbox[b]

\end@float

Many fundamental phenomena occurring in various kinds of complex systems, ranging from technological networks (e.g., transportation, communication, energy), to biological networks (e.g., neural, ecological, biochemical) and social networks (in the real world or over the Internet) can be described by dynamical processes taking place over the underlying graph representing the system structure. Such processes modify over time the internal state of nodes and spread across the network following the edges of the graph.

One of the most widely studied example of such dynamical processes is the epidemic process, which starts from an initial set of infected nodes (usually referred to as seeds, chosen either deterministically or random) that can pass the infection to other (susceptible) nodes (under many possible models), possibly causing a major outbreak throughout the network.

In our work we consider a generalized model for the spreading of an ‘epidemic’, in which nodes are characterized by an infection threshold (either deterministic or random), and become infected when they collect from their neighbors an amount of influence larger than . A special case of our model is the well known bootstrap percolation process, in which is an integer () and each edge exerts an influence equal to one: simply put, a node becomes infected when it has at least infected neighbors.

Bootstrap percolation has a rich history, having been initially proposed in the area of statistical physics [?]. Due to its many physical applications (see [?] for a survey) it has been primarily studied over the years in the case of regular structures (lattices, grids, trees), most notably in a series of papers by Balogh and Bollobás (e.g., [?]). More recently, bootstrap percolation has been investigated also in the context of random graphs, which is the focus of this paper. In our work we are especially interested in epidemics occurring on very large, irregular structures such as those representing friendship relationships among people. This interest is motivated by the great popularity gained by online social platforms (e.g., Facebook, Twitter, Instagram, etc.), which, coupled with the increasing availability of always-on connectivity through mobile personal devices, has created an unprecedented opportunity for the rapid dissemination of various kinds of news, advertisements, viral videos, as well as a privileged environment for online discussion, creation and consolidation of beliefs, political opinions, memes and many other forms of collective reasoning. In this respect, bootstrap percolation provides a simple, primitive model that can be used to understand the diffusion of a generic ‘idea’ which requires a certain amount of ‘reinforcement’ from neighbors to be locally adopted.

Some results have already been obtained for particular random graph models. In particular, [?] first considered bootstrap percolation in the random regular graph , while [?] has extended the analysis to random graphs with given vertex degrees (configuration model). The above two papers assume that node degree is either fixed [?] or it has both finite expectation and finite second moment [?], implying that the cardinality of the seed set must scale linearly with to observe a non-negligible growth of the epidemics. Both papers make use of the differential equation method to analyze the discrete Markov Chain associated with the epidemic process. The analysis in [?] also allows the threshold to vary among the nodes.

A very different technique has been recently proposed in [?] to study bootstrap percolation in Erdös–Rényi graphs. This technique allows to analyze also scenarios in which a sharp phase transition occurs with a number of seeds which is sublinear in : below a critical seed set size, for which one can get a closed-form asymptotic expression, the infection essentially does not evolve, whereas above the critical size nodes get infected with high probability111Throughout this paper we shall use the following (standard) asymptotic notation. Let be two functions. We write: or and or if ; if there exist , : , for any ; if . Unless otherwise specified, in this paper all limits are taken as .. In , this behavior is possible only when the average node degree itself grows with (i.e., ). The technique proposed in [?] has been applied by [?] to power-law random graphs generated by the Chung-Lu model (with power law exponent ), obtaining the interesting result that, under bounded average node degree, a sublinear seed set size is enough to reach a linear fraction of the nodes.

Also our work started from the approach proposed in [?], which provides a simple and elegant way to explore phase transitions taking place at sub-linear scale. To operate at this scale, we let, if needed, the average node degree to grow with , since this can be considered an acceptable assumption in many cases. Indeed, real social networks (and in particular online social networks), which evolve over time with the addition/removal of nodes/edges, often exhibit the so called densification phenomenon [?], meaning that the number of edges grows faster than the number of nodes (hence the average degree grows with time)222in practice, asymptotic results provide very good predictions of what happens in large (but finite) systems whenever the average degree is not too small, say significantly larger than ..

The main thread of our work is to show the high ‘vulnerability’ (in terms of critical number of seeds) that arises in networks when we add inhomogeneities in any one of many possible ways (i.e., by adding variability in thresholds, edge weights, node degree, or network structure). Although this effect has already been observed in epidemic processes, the way in which inhomogeneities affect bootstrap percolation can be so dramatic that just extreme values of distributions (and not their particular shape) can determine the critical size of the seed set. We believe that this result, which apparently has not been recognized before, is of fundamental importance to better understand the dynamics of epidemics in complex systems.

We start introducing some background material and notation taken from [?], which is necessary to follow the rest of the paper. As already mentioned, [?] provides a full picture of standard bootstrap percolation in Erdös–Rényi graphs . Nodes are characterized by a common integer threshold , and the process starts with an initial set of vertices (the seeds), of cardinality , which are chosen uniformly at random among the nodes. We will use the same terminology adopted in [?], where infected nodes are called ‘active’, whereas non-infected nodes are said to be inactive. An inactive node becomes active as soon as at least of its neighbors are active. Note that seeds are declared to be active irrespective of the state of their neighbors. Active nodes never revert to be inactive, so the set of active nodes grows monotonically.

The bootstrap percolation process naturally evolves through generations of vertices that become active. The first generation is composed of all those vertices which are activated by the seeds. The second generation of active nodes is composed by all the nodes which are activated by the joint effect of seeds and first generation nodes, etc. The process stops when either an empty generation is obtained or all nodes are active.

Now, it turns out that there is a useful reformulation of the problem that makes the process especially simple to analyze. This reformulation, which was originally proposed in [?], consists in changing the time scale, by introducing a virtual (discrete) time step , such that a single active node is ‘explored’ at each time step (if the process has not yet stopped). By so doing, we forget about the generations, obtaining a more amenable process which is equivalent to the original one, in terms of the final size of the epidemic.

The above reformulation requires to introduce, besides the set of nodes which are active at time , another set , referred to as used vertices, which is the subset of active vertices, of cardinality , explored up to time . More precisely, at time zero the set is initialized to the seed set, while the set of used vertices is initialized to the empty set: . Each node is given a counter , initialized to 0 at time .

At time we arbitrarily choose a node and we ‘fire’ its edges, incrementing by one the counter of all its neighbors. By so doing, we use node , adding it to the set of used nodes, so that . We continue recursively: at each time , we arbitrarily select an active node which has not been already used, i.e., , and we distribute new ‘marks’ to its neighbors, which are not in , incrementing their counters. Node is added to the set of used vertices: . We then check whether there are some inactive vertices, denoted by set , that become active for effect of the marks distributed at time (i.e., vertices whose counter reaches at time ). Such newly activated vertices are added to the set of active vertices: (note that no vertices can be activated at time 1, being ).

The process stops as soon as , i.e. when all active nodes have been used. Let . By construction, the final size of the epidemic is exactly equal to : .

The above reformulation of the problem is particularly useful because the counter associated to each inactive node can be expressed as:

(1)

i.e., as the sum of independent Bernoulli random variables of average , each associated with the existence/non existence of an edge in the underlying graph, between the node used at time and node . Indeed, it is perfectly sound to ‘reveal’ the edges going out of a node just when the node itself is used (principle of deferred decision). Moreover we can, for convenience, express the counters of all of the nodes at any time just like (Generalized threshold-based epidemics in random graphs: the power of extreme values), without affecting the analysis of the final size of the epidemics. Indeed, by so doing we introduce extra marks that are not assigned in the real process (where each edge is revealed at most one, in a single direction), specifically, when a used node is ‘infected back’ by a neighboring used node. However, this ‘error’ does not matter, since it has no impact on the percolation process. Note that counters expressed in such a way are independent from node to node.

The dynamics of the epidemic process are determined by the behavior of the number of ‘usable’ nodes (i.e., active nodes which have not been already used):

where represents the number of vertices, which are not in the original seed set, that are active at time . Note that the final size of the epidemics equals the first time at which . Moreover, by construction, the number of used vertices at time equals . Now, let be the probability that an arbitrary node not belonging to the seed set is active at time . There are such nodes, each active independently of others, hence .

In essence, we need to characterize trajectories of process which, besides a deterministic component (decreasing with time), includes a random variable which is binomially distributed, with time-dependent parameter (increasing with time):

(2)

In particular, whenever we can prove that, for a given , , then we can conclude that at least vertices get infected w.h.p. Similarly, if, for a given , , we can conclude that the percolation terminates w.h.p. before , thus the final number of infected vertices will be smaller than . We now present a simplified form of the main theorem in [?], together with a high-level description of its proof.

Theorem 2.1 (Janson [?])

Consider bootstrap percolation in with , and a number of seeds selected uniformly at random among the nodes. Let be such that , . Define:

(3)
(4)

If (subcritical case), then w.h.p. the final size is . If , for some (supercritical case), then w.h.p. .

Note that, under the above assumptions on , the ‘critical time’ is such that both and , and the same holds for the critical number of seeds , which differs from just by the constant factor , i.e., we get a phase transition for a sublinear number of seeds.

The methodology proposed in [?] to obtain the above result is based on the following idea: is sufficiently concentrated around its mean that we can approximate it as . Now, for a wide range of values of (i.e., whenever , and in particular around ), can be expressed as . Therefore function has a clear trend: it starts from at and first decreases up to a minimum value reached at , after which it grows to a value of the order of . Hence, time acts as a sort of bottleneck: if is positive (negative), we are in the supercritical (subcritical) case. Finally, we can compute the asymptotic value of by finding the minimum of function .

The result then follows considering that, starting from seeds, we get , and that by changing we deterministically move up or down the process . Hence, if we assume that is asymptotically bounded away from 1 we obtain a sufficient ‘guard factor’ around the trajectory of the mean process to conclude that the real process is either supercritical or subcritical (see Fig. Generalized threshold-based epidemics in random graphs: the power of extreme values).

Figure \thefigure: Example of (asymptotic) trajectories of the mean number of usable nodes, , with . The plot also illustrates by shaded regions the concept of ‘guard zone’.

We emphasize that in [?] authors use a martingale approach to show that is sufficiently concentrated around its mean, which allows them to establish their results w.h.p.

As last premise, it is better to clarify why we assume . The reason is that the case in which a node can be infected by just a single neighbor is degenerate, and leads to the trivial fact that a single seed is enough to infect the entire connected components it belongs to. Hence, one has to apply a totally different set of tools [?] to characterize the final size of the epidemic. This case, however, is not interesting to us, since the networks of many real systems are connected by construction, or they at least have a giant connected component. Hence, no phase transitions occur here in the number of seeds.

In this work we extend the approach of [?] along three ‘orthogonal’ directions that allow us to study more general threshold-based epidemic processes in inhomogeneous scenarios.

Figure \thefigure: Examples of distributions of and leading to the same (asymptotic) critical number of seeds .
  1. We consider a generalized version of bootstrap percolation in , in which thresholds of nodes are i.i.d. random variables , and infected nodes transmit a random amount of infection to their neighbors. Specifically, we assume that i.i.d. weights are assigned to the edges of the graph, representing the amount of infection transmitted through the edge. For this case, we obtain the asymptotic closed form expression of the critical number of seeds, and an exponential law for the probability that the process is supercritical or subcritical, strengthening the results in [?] (where results hold, instead, w.h.p.). The most significant outcome of our analysis is that the critical number of seeds typically does not depend on the entire distribution of and , but just on values taken in proximity of the lower (for ) and upper (for ) extreme of their support. For instance, in Figure Generalized threshold-based epidemics in random graphs: the power of extreme values we show examples of two (discrete) distributions for , labelled and , and two (discrete) distributions for , labelled and labelled . It turns out that any combination of them , with and leads to the same asymptotic critical number of seeds . Note that the various distributions have different means, and that one of them () has even negative mean.

  2. We extend the problem reformulation originally proposed in [?], where a single node is used at each time, to a similar reformulation in which a single edge is used at a time. This view is more convenient to apply the approach of [?] to other random graph models. In particular, we consider graphs with given degree sequence (configuration model), obtaining a closed-form expression of the asymptotic critical number of seeds. We then compute the scaling order of for the particular (but most significant) case of power-law degree sequence, considering a wider range of parameters with respect to the one studied by [?]. Again, we observe the interesting phenomenon that in some cases the precise shape of the degree distribution (i.e., the power law exponent) does not matter, since is determined by the largest degree.

  3. We extend the analysis to the so-called block model, which provides a simple way to incorporate a community structure into a random graph model while preserving the analytical tractability of . We observe once more the interesting effect that the critical number of seeds might be determined by a single entry of the matrix of inter- (or intra-) community edge probabilities (i.e., the most vulnerable community).

Although we consider (for simplicity) the above three forms of inhomogeneity ‘in isolation’, it is not particularly difficult to combine them, if desired. Indeed, we show that all extensions above can be studied within a unique framework. We emphasize that in this paper we generally assume that seeds are selected uniformly at random among the nodes, without knowledge of thresholds, weights, degrees, network structure. This differentiates our analysis from existing works addressing the so called influence maximization problem, i.e., finding the seed set that maximizes the final size of the epidemic (e.g., [?]). We observe that in the influence maximization framework many authors have already considered generalized models taking into account the impact of edge weights, node-specific thresholds, etc. (e.g., variants of the linear threshold model proposed in [?]). However, to the best of our knowledge, asymptotic properties of such generalized models are still not well understood. This paper makes a step forward in this direction analysing sublinear phase-transitions occurring when seeds are allocated uniformly at random in the network.

Interestingly, in all cases that we consider the epidemic is triggered among the most vulnerable nodes, and then it spreads out hitting less and less vulnerable components of the network. This fact can have dramatic consequences on the minimum number of seeds that can produce a network-wide outbreak.

In the following sections we present the above three contributions one at a time. Simulation experiments are presented along the way, to validate and better illustrate our analytical results.

We start considering Erdös–Rényi random graphs , extending basic bootstrap percolation to the case in which node thresholds and/or node-to-node influences are i.i.d random variables. We denote by the (real-valued) threshold associated to node . We then assign a (real-valued) random weight to each edge of the graph, representing the influence that one node exerts on the other (see later). Node becomes active when the sum of the weights on the edges connecting to already active neighbors becomes greater than or equal to .

Recall that each edge of the graph is ‘used’ by the process at most once. Hence our analysis encompasses both the ‘symmetric’ case in which the influence (possibly) given by to equals the influence (possibly) given by to , and the ‘asymmetric’ case in which weights along the two directions of an edge are different (i.i.d.) random variables. In both cases, we can consider a single random weight on each edge.

We do not pose particular restrictions to the distributions of and , except for the following one, which avoids the degenerate case in which a node can get infected by a single edge (the case in basic bootstrap percolation):

(5)

Note that we can also allow to take negative values, which could represent, in the context of social networks, neighbors whose behavior steers us away from the adoption of an idea. This generalization produces, indeed, rather surprising results, as we will see. However, negative weights require to introduce some extra assumptions on the dynamics of the epidemics process, which are not needed when weights are always non-negative. Specifically, with negative weights we must assume that i) once a node becomes infected, it remains infected for ever; ii) some random delays are introduced in the infection process of a node and/or on the edges, to avoid that a node receives the combined effect of multiple (simultaneous) influences from active neighbors. We argue that assumption ii) is not particularly restrictive, since in many real systems influences received by a node take place atomically (e.g., a user reading ads, posts, twits, and the like). Assumption i) instead is crucial, because with negative weights counters no longer increase monotonically, and thus they can traverse the threshold many times in opposite directions. Assumption i) can be adopted, however, to study many interesting epidemic processes whose dynamics are triggered by nodes crossing the threshold for the first time333For example, on some online platforms, notifications that a user has watched a given viral video, bought a product, expressed interest for an event, etc., might be sent immediately (and once) to his friends, no matter if the user changes his mind later on..

The analysis of the general case can be carried out by exploiting the same problem reformulation described in Sec. Generalized threshold-based epidemics in random graphs: the power of extreme values, in which a single active node is used at each time step. Indeed, we can associate to inactive nodes a (real-valued) counter, initialized to 0 at time , which evolves according to:

(6)

where , , is a Bernoulli r.v. with average revealing the presence of edge and , , is the random weight associated to the same edge. Similarly to the basic case, the above expression of can be extended to all nodes and all times, without affecting the results. By so doing, counters are independent from node to node.

We then re-define , as the probability that an arbitrary node which is initially inactive (take node 1), has become active at any time :

With the above definition, the system behavior is still determined by trajectories of process (2). We have:

(7)

where equation (a) is obtained by conditioning over the number of variables , and we have defined :

which can be interpreted as the probability that a node, which has sequentially received the influence of infected neighbors, has become active. Let . Note that, as consequence of elementary properties of random walks, when (recall also (5)). We introduce the following fundamental quantity:

(8)

In words, is the minimum number of infected neighbors that can potentially (with probability ) activate a node. Note that, as consequence of (5), it must be . For example, under the distributions shown in Fig. Generalized threshold-based epidemics in random graphs: the power of extreme values, we have , .

We are now in the position to state our main results for the generalized bootstrap percolation model in . First, we define:

Moreover, we shall consider the function:

Theorem 4.1 (Super-critical case)

Under the assumptions: , , for some . Then,

where is the constant:

For the sub-critical case, we define the function
, for , , and we we denote by the only444Function is continuous and strictly increasing on with and . solution of , . Furthermore, having defined the interval , it holds:

Theorem 4.2 (Sub-critical case)

Under the assumptions: , and for some . Then, ,

where and are defined as above, and

We shall provide here a sketch of the proof of Theorems 4.1 and 4.2. The complete proofs, including all mathematical details, can be found in [?].

At high level, we can show that almost complete percolation occurs under super-critical conditions, by:

  • analysing the trajectory of the mean of process (2), , finding conditions under which the above quantity is positive (with a sufficient guard factor) for any , for arbitrarily small .

  • showing that the actual process is sufficiently concentrated around its mean that we can conclude that w.h.p. for any .

For the sub-critical regime we can use similar arguments, showing that becomes negative at early stages, and that is sufficiently concentrated around its average that we can claim that the actual process stops at early stages w.h.p.

We start from the asymptotic approximation of :

(9)

which holds for any such that . The above approximation allows us to write, for any :

under the further assumption that . Thus, having defined for any function , for large enough we can determinate the sign of for any by analysing the behavior of . Elementary calculus reveals that has a unique minimum at:

with , . Thus, we obtain an asymptotic closed-form expression for the critical number of seeds (one can easily verify that, under the assumption , it holds , , , ).

The difficult part of the proofs is to show that is sufficiently concentrated around its expectation that we can establish exponential bounds (as ) on the probability that the final size of the epidemics falls outside the intervals stated in Theorems 4.1 (super-critical case) and 4.2 (sub-critical case).

For the super-critical case, we adapt a methodology proposed in [?], which separately considers four time segments555The boundaries of all segments are to be meant as integers. However, to simplify the notation, we will omit and symbols.: i) segment666note that the process cannot stop at . (where is a constant); ii) segment ; iii) segment (where is a constant); iv) segment . Note that segment i) contains the most crucial, initial phase of the process.

Figure \thefigure: Phase transitions in for different threshold distributions , , , averaging the results of simulations. Analytical predictions are shown as vertical dotted lines.

The following lemma provides a fundamental property related to segment i), which provides the key to obtain the result in Theorem 4.1:

Lemma 4.3

Under the assumptions of Theorem 4.1, let be an arbitrarily fixed constant.

where is given in the statement of Theorem 4.1.

The detailed proof is reported in Appendix Generalized threshold-based epidemics in random graphs: the power of extreme values. We outline here the three main ingredients to prove Lemma 4.3: i) we exploit standard concentration results for the binomial distribution, providing exponential bounds to at any in the considered domain; ii) we employ the union bound to upper bound the probability by ; iii) we use the property .

We emphasize that in this paper we employ different techniques with respect to those used in [?], where authors rely on concentration results for derived from Martingale theory (Doob’s inequality). Instead, we combine deviation bounds specifically tailored to the binomial distribution (see Appendix Generalized threshold-based epidemics in random graphs: the power of extreme values) with the union bound, obtaining a conceptually simpler approach which also permits us to obtain explicit exponential laws for probabilities related to the final size of the epidemics (i.e., a stronger result with respect to main Theorem 3.1 in [?], which holds just w.h.p.).

As immediate consequence of Lemma 4.3 we can say that the process does not stop before with probability , being .

Considering that quickly (super-linearly) increases after (as long as approximation (9) holds), we can expect that the process is extremely unlikely to stop in segment , if it survives the first bottleneck segment. The proof of this fact is reported in Appendix Generalized threshold-based epidemics in random graphs: the power of extreme values, where we also handle segment .

Here we focus instead on the last temporal segment, where the value of comes into play determining the final size of the epidemics. Indeed, we are going to show that are infected with probability . In general, we can assume that with . Given an arbitrary such that , we make use of concentration inequality (20) to write:

(10)

for any arbitrary (and large enough). We have:

Exploiting (21), the above probability goes to 0 faster than for any , proving that at least nodes are infected. When , we can similarly show that no more than nodes are infected. Indeed, considering that , we can apply (20) to show that goes to 1 faster than , for any .

To validate our analysis, and understand how well asymptotic results can predict what happens in large (but finite) systems, we have run Monte-Carlo simulations of our generalized bootstrap percolation model. In each run we change both the identity of the seeds and the structure of the underlying graph. We compute the average fraction of nodes that become infected, averaging the results of 10,000 runs.

We first look at the impact of random thresholds, while keeping equal weight on all edges. We consider three different distributions of : i) constant threshold equal to 2 (denoted ); ii) uniform threshold in the set , (denoted ); iii) two-valued threshold, with and (denoted ); Note that all three distributions have , but their expected values are quite different. Moreover, for , whereas for both and .

The asymptotic formula for the critical number of seeds gives in this scenario . We consider either a ‘small’ system, in which , , or a ‘large’ system, in which , . Results are shown in Fig. Generalized threshold-based epidemics in random graphs: the power of extreme values using a log horizontal scale on which we have marked the values of derived from the asymptotic formula. We use the same line style for each threshold distribution, and different line width to distinguish the small system (thick curves) from the large system (thin curves).

We make the following observations: i) the position of the phase transition (i.e., the critical number of seeds) is well estimated by the asymptotic formula; ii) despite having quite different shapes, distributions and lead asymptotically to the same critical number of seeds, as suggested by results obtained in the large system, where the corresponding curves are barely distinguishable (at ); iii) phase transitions become sharper for higher values of the critical number of seeds, confirming that the probability law by which the process is supercritical/subcritical depends strongly on itself (as stated in Theorems 4.1 and 4.2).

We next move to a scenario in which the threshold is fixed, , and we vary the weights on the edges. We will consider, for simplicity, a simple case in which the influence exerted between two nodes can take just two values: +1, with probability , and -1, with probability . Note that the average influence, , can even be negative, if we select . In this scenario, we have , , hence . We consider either a ‘small’ system, in which , , or a ‘large’ system, in which , , which produce the same value of , for any .

Figure \thefigure: (left plot) Phase transitions in for fixed threshold , and random weights , with . (right plot) results for the corresponding simple random walk.

Results are shown in Fig. Generalized threshold-based epidemics in random graphs: the power of extreme values (left plot), using a log horizontal scale on which we have marked the values of derived from the asymptotic formula. We use the same line style for each value of , and different line width to distinguish the small system (thick curves) from the large system (thin curves). We observe that in the small system the average fraction of infected nodes saturates to a value significantly smaller than one for , although we expect that, as , all nodes should get infected in this case (for which ). In the large system, the discrepancy between simulation and asymptotic results disappears.

This phenomenon can be explained by considering that the counter of inactive nodes behaves as a simple random walk (i.e., with steps ) with an absorbing barrier at . Recall [?] that for this simple random walk the absorption probability is for , while it is equal to for . Moreover, the mean time to absorption (conditioned to the event that the walk is absorbed) is (see right plot in Fig. Generalized threshold-based epidemics in random graphs: the power of extreme values). On the other hand, the time horizon of this equivalent random walk is limited by the node degree, since a node cannot receive a number of contributions to its counter greater than the number of its neighbors. In the small system the average degree () is too small to approach the asymptotic prediction, whereas in the large system the average degree () is large enough (i.e., much larger than the mean time to absorption) to observe convergence of the final size to the asymptotic prediction obtained with . Interestingly, a finite fraction of nodes (asymptotically, around 0.44) gets infected with , a case in which the average node-to-node influence is negative!

Up to now we have considered the random graph model, and we have followed the same problem reformulation adopted in [?], in which a single node is used at a time, revealing all its outgoing edges. This approach is especially suitable to , since marks are i.i.d binomial random variables. We introduce now an alternative description of the percolation process, in which a single edge is used at a time. This approach is more convenient to analyze other random graph models, such as (graphs with pre-established number of edges), (where all nodes have the same degree), or the configuration model.

We consider the (multi)-graph in which, starting from a graph with no edges, edges are sequentially added, each connecting two nodes selected (independently) uniformly at random. Note that by so doing we can generate parallel edges, as well as self loops. However, following the same approach as in Corollary 3 of [?], it is possible to show that sequences of events that occur w.h.p. over , occur w.h.p. also over , with denoting the class of (simple)-graphs having edges, with associated uniform probability law. Therefore our results apply to as well.

To analyze bootstrap percolation in , we consider the following dynamical process: when a node becomes active, all edges connecting this node to other nodes which are still non active are denoted as ‘usable’, and added to a set of usable edges. At a given time step , one usable edge is selected uniformly at random from , adding one mark to the endpoint that was inactive (when the edge became usable), provided that this endpoint is still inactive. The selected edge is then removed from . Set is initialized with the edges connecting seeds to non-seeds. By construction, at most one node can become active at each time instant. Hence, denoting with the number of active nodes at time (initialized to a), we have .

Let be the probability that a node, which is not a seed, has been activated at time . While it is not easy to write an exact expression of , we can provide asymptotically tight bounds on , as follows:

This because we can reveal the endpoint of an active edge only when this edge is used, by choosing uniformly at random one of the nodes that were non active at the time instant at which the considered edge became active. Hence, an inactive node receives a mark at time with probability (independently from other previously collected marks). Furthermore, by construction, we have . At timescale , we can approximate as:

(11)

The dynamics of (whose size is denoted by ) obey the following equation:

where represents the (cumulative) number of edges activated at . The process stops at time . Similarly to the case, the number of nodes that have become active by time is the sum of identically distributed Bernoulli random variables with average . Indeed, .

Note that by construction marks are distributed only to inactive nodes, therefore a node stops receiving marks as soon as . Differently from , however, variables are not independent, given that at most marks have been distributed by time (i.e., ). Note that we still have .

For what concerns the total number of edges activated by time , , we can express it as the sum of random variables associated with nodes in , representing the numbers of edges activated along with node (i.e. the number of edges connecting node with inactive nodes):

We can evaluate by dynamically unveiling, for every inactive edge, whether node is one of its endpoints (but not both). It turns out where is the time instant at which the -th node was activated. Indeed, represents the number of edges still to be activated at time , while is the probability that node is an endpoint (but not both) of any such edges. Observe that variables are not independent, as consequence of the fact that that sum of all edges in the graph is constrained to be . However, is conditionally independent from , with , given and . Moreover, for any we have:

(12)

In particular, the expectation of satisfies:

Moreover, under the assumption , since for , and , we have:

while . Recalling (11), we have in conclusion:

Now, similarly to the case of , we can determine the critical number of seeds by: i) determining sufficient and necessary conditions under which for some arbitrary and any ; so doing we determine the critical number of seeds . ii) Exploiting the fact that is sufficiently concentrated around its mean for , where is a properly defined constant. iii) Showing that for , can be bounded from below away from 0.

For what concerns point i) we follow the same lines as for , defining function , and finding the minimum of , which is achieved at:

with as long as . Observe that is the average node degree (replacing in the expression of obtained for ) while can be interpreted as the probability that two specific vertices are connected by at least an edge (replacing for ). Evaluating and imposing , we obtain the critical number of seeds:

(13)

which is exactly the same as what we get in through the substitution and .

For what concerns ii) and iii) we can proceed in analogy with the case of , exploiting standard concentration results. In particular, we first focus on time instants for suitable . We need to show that w.h.p. provided that for arbitrary (i.e., ). To this end observe that from (Generalized threshold-based epidemics in random graphs: the power of extreme values), the fact that and , and recalling the above mentioned property of conditional mutual independence of variables , it descends that w.h.p., for any : with mutually independent and for an arbitrarily small . At last observe that can be easily bounded using inequalities (20) and (21).

For what concerns iii) we adopt arguments conceptually similar to the case of , exploiting the fact that quickly (super-linearly) increases after .

The edge-based problem reformulation described in previous section can be easily extended to the configuration model , in which we specify a given degree sequence (possibly dependent on ) with associated empirical distribution function . For simplicity, we limit ourselves to describing the computation of the critical number number of seeds . However, the approach can be made rigorous by following the same lines as for . As before, properties of multi-graphs apply as well to simple-graphs .

Similarly to what we have done for , we focus on the evolution of the number of activable edges:

and compute the critical time by finding the minimum of .

The impact of node degree can be taken into account by evaluating the probability that a node with degree has been activated by time . Moreover, we need to consider the amount of edges that a node contributes to after being activated. There are in total ‘end-of-edges’ in the network, so the probability that a given end-of-edge is active at time is . Hence, we can write: