Influence Spread in Social Networks: A Study via a Fluid Limit of the Linear Threshold Model

Influence Spread in Social Networks: A Study via a Fluid Limit of the Linear Threshold Model

Srinivasan Venkatramanan,  Anurag Kumar, 
Department of Electrical Communication Engineering, Indian Institute of Science,
Bangalore - 560012, India.
E-mail: vsrini,anurag@ece.iisc.ernet.in
Abstract

Threshold based models have been widely used in characterizing collective behavior on social networks. An individual’s threshold indicates the minimum level of “influence” that must be exerted, by other members of the population engaged in some activity, before the individual will join the activity. In this work, we begin with a homogeneous version of the Linear Threshold model proposed by Kempe et al. [1] in the context of viral marketing, and generalize this model to arbitrary threshold distributions. We show that the evolution can be modeled as a discrete time Markov chain, and, by using a certain scaling, we obtain a fluid limit that provides an ordinary differential equation model (o.d.e.). We find that the threshold distribution appears in the o.d.e. via its hazard rate function. We demonstrate the accuracy of the o.d.e. approximation and derive explicit expressions for the trajectory of influence under the uniform threshold distribution. Also, for an exponentially distributed threshold, we show that the fluid dynamics are equivalent to the well-known SIR model in epidemiology. We also numerically study how other hazard functions (obtained from the Weibull and loglogistic distributions) provide qualitative different characteristics of the influence evolution, compared to traditional epidemic models, even in a homogeneous setting. We finally show how the model can be extended to a setting with multiple communities and conclude with possible future directions.

{keywords}

influence spread, threshold models, fluid limits, SIR epidemic, hazard rate

I Introduction

Social networks play a fundamental role in the spread of information, ideas and influence among its members. The study of influence spread as a stochastic process has been of interest to sociologists for several decades [2]. Such diffusion processes have been used to characterize collective behavior [3], adoption of innovations [4, 5], etc. among a population of users. Similar models have also been developed independently in other domains to study epidemics[6], synchronization in biological systems[7], etc.

Online social networks such as Facebook, and Twitter, with their widespread adoption, have enabled information spread on a scale heretofore unimaginable. A significant fraction of online traffic comprises user-generated content on platforms such as Wordpress (text), Flickr (images), YouTube (videos), etc. In most such platforms, we see that users can obtain information on the global popularity of an item of content, for instance the number of views for a YouTube video. Global metrics such as viewcount provide a crude signal to the user about the quality of the content. Given the limited attention span and vast quantity of content available on the Internet, the tendency of a particular user to view a video or read an article increases with the number of people who have already viewed/read it. Hence a YouTube video with many views or a news article with many “Likes” on Facebook is more likely to be accessed than others. The understanding and prediction of popularity evolution [8, 9] of such content is crucial to the content provider for (i) choosing appropriate content caching strategies for quick delivery (ii) deploying better advertisement mechanisms for increased monetization.

Related Work: Threshold models are well established for modeling the evolution of popularity in human populations. Everett [4] explored the adoption of innovations, employing examples from rural sociology, and noted the diversity in people’s propensity to adopt an innovation. He categorized them into various adopter groups (see Figure 1) in what is now known as the Everett’s bell curve111Everett’s use of the term “bell curve”, is not be taken to imply that the thresholds are normally distributed.. It is used to represent the process of adoption of a new product over time. Under the threshold interpretation, users in the leftmost class (innovators) can be interpreted as having the least threshold to adopt an innovation, and the ones in the rightmost class (laggards) as having the highest threshold (and hence least susceptible).

Fig. 1: Everett’s “bell curve”. It is also known as the Technology Adoption Lifecycle since it represents the adoption of a new innovation over time. The bell shaped curve represents the incremental change in adoption over time, whereas the “s” shaped curve shows the cumulative adoption. Under the threshold interpretation, the early adopters (innovators) can be thought of as having the least threshold for adoption, and so on.(Source [4])

Granovetter, in his seminal work [3] on collective behavior, aimed to use threshold distributions to model the spread of binary decisions among a group of rational agents, for instance during riots, voting, etc. Using his model, he calculated the equilibrium, i.e., steady state split of the population (between the binary decisions) given the threshold distribution, and also considered the stability of such equilibria. Valente [5] further refined this approach to threshold phenomena based on personal networks (local neighbourhood of an individual) as against whole social systems, and empirically studied datasets on the adoption of medical and rural innovation.

Domingos and Richardson [10] studied influence spread in the context of viral marketing, and they posed the algorithmic question of maximizing the spread of influence, given the underlying social influence network. Kempe et al. continued the algorithmic approach in [1], where they studied the influence maximization problem under two different activation models (Linear Threshold model and the Independent Cascade model). They proved the submodularity of the influence function (i.e., the set valued function that maps the initial “seed” set of adopters to the final set), and provided greedy approximation algorithms to maximize influence spread under the linear threshold model. Recently, in the context of user-generated content on the Internet, game theoretic analysis has shown that threshold based policies could emerge as equilibria when user’s seek to maximize their utility (based on the perceived quality of the content) [11].

Missing Link: Our work in this paper is inspired by the earlier efforts to employ a threshold model for the propensity of a user to be influenced by others in the population  [1] [3]. We will now discuss certain modeling details in these two important models of threshold based spread of influence, to motivate our work. Granovetter [3], rooted in sociology, considered general influence threshold distributions, and characterized the spread of influence by a simple difference equation. For instance, if the threshold is distributed with c.d.f. , letting denote the number of influenced individuals at time , the evolution is described by the difference equation

(1)

and thus, the equilibrium outcome is a fixed point of Equation 1 (see Figure 2). This approach provides an explicit dynamics of influence spread, and characterizes the fraction of the population that is eventually influenced. Though the analysis seems reasonable at first glance, careful inspection reveals that the implied system dynamics will involve nodes resampling their thresholds at every timestep. One should note that the threshold distribution is introduced to capture the variation in the unknown norms/preferences of the individuals in the population, but once sampled, they should remain unaltered during the process. Also, in Equation 1, there is no distinction between nodes that are already active and the nodes that are still susceptible to influence. This contradicts the assumption that the spread of influence is progressive (where nodes once influenced will remain active until the end of the process).

On the other hand, Kempe et al. [1] assumed a uniform distribution for the influence threshold, and focused on the algorithmic problem of selecting an initial seed set (of a given size) so as to maximize the final influenced set. Although, the model has progressive spread dynamics, the evolution itself was not a primary concern in [1]. It was explicitly noted that the thresholds need to be sampled only once at the beginning of the process.

Finally, for the uniform distribution of threshold (as assumed throughout in [1]), Equation 1 does not yield any useful insight, i.e., it does not predict a spread of influence.

Thus, although both Granovetter [3] and Kempe at al. [1] work with influence threshold distributions, the dynamics of the influence process are quite different. Further, although Kempe et al. work only with uniformly distributed threshold, Granovetter permits more general threshold distributions. The point of departure of our work is to adopt the idea of general threshold distributions from [3], while retaining the more natural model of sampling each individual’s threshold just once in the beginning, and of the progressive spread of influence from [1].

Fig. 2: Granovetter’s Fixed point dynamics. The intersection(s) of the cumulative distribution function with the line, indicate the equilibrium points, i.e. solution to the fixed point equation 1. (Source [3])

Our Contributions: We adopt a fluid limit approach for analytically characterizing the dynamics of the spread of influence in the Linear Threshold model, with general threshold distributions. In order to do this, in Section II we propose a homogeneous version of the Linear Threshold model called the Homogeneous Influence Linear Threshold (HILT) model, with arbitrary threshold distribution. In Section III we characterize the evolution of influence in the HILT model as a Markov process, and using Kurtz’s theorem [12], we derive a system of ordinary differential equations (o.d.e.). We provide simulation results that show that the o.d.e. approximates the original process fairly well for large values of , the population size. In Section IV, we explicitly solve the o.d.e. for the uniform threshold distribution thus providing an explicit characterization of the evolution of influence in the model of Kempe et al.[1]. We also provide an analytical expression for the terminal spread of influence and use it to address some optimization problems (Section V).

We note that the threshold distribution features in the o.d.e. via the hazard function[13], commonly used in survival/failure analysis. To the best of our knowledge, this is the first work that incorporates the hazard function to characterize variation among the individuals in an epidemic model. The variation of risk as an epidemic progresses has been empirically observed in recent studies in veterinary medicine [14] which has highlighted the need to “consider possible differences in the risk of infection among subgroups in the population”.

We then proceed to study the effect of the threshold distribution in Section VI. We also note that under the exponential threshold distribution, the fluid dynamics of the HILT model is equivalent to that of the classic SIR model from epidemiology [15], thus providing another interesting link between the influence spread and epidemics literature. Finally, we also show that the analysis can be extended to a heterogeneous system with communities (Section VII), and conclude with some possible future directions.

Comment on Network Topology: It has been noted that network topology plays a crucial role in the spread of influence on a social network [16]. In this work, however, we consider only completely connected graphs, while deriving the fluid limit equations. While it is necessary to study the impact of degree distribution on the fluid dynamics, our primary focus is to provide the missing link between Kempe’s model [1] and Granovetter’s model [3], and thus do not discuss the effect of network topology in this paper.

Ii Mathematical Model

We shall first introduce the network model described in Kempe et al. [1]. The social network is a weighted directed graph , where the edge weight gives a measure of influence of node on node . The activation process (see Figure 4) begins with an initial set of active, infectious nodes and takes place in discrete time steps. Each active node spreads its influence to each of its inactive neighbours. By the activation process, some of the neighbours become activated to be part of , and can spread their influence in the next step. At the end of each step the population is partitioned into three sets of nodes: nodes that were just activated in that step (also referred to as infectious nodes), active nodes that have already exercised their influence and, hence, are no longer infectious, and the set of inactive nodes (). Note that . The activation process stops at a random time when there are no more infectious nodes, i.e., and a terminal set is reached, from where the activation process cannot proceed further. We also assume that once a node has become active, it cannot become inactive (progressive case).

Ii-a Linear Threshold model

An activation model describes how the infectious nodes cause the inactive nodes to become active (and infectious). There are two widely used activation models, namely, the Linear Threshold model and the Independent Cascade model, proposed in [1]. Our work in this paper begins with the Linear Threshold (LT) model. In the LT model, , i.e., the maximum possible influence on any node is bounded by 1 (see Figure 3). In this model, each node randomly chooses a threshold from a uniform distribution , at the beginning. An inactive node, receives influence from all its active neighbours, and gets activated once the net received influence exceeds the chosen threshold. In other words, a node gets activated in step if, it had been inactive until step , i.e. , and

Fig. 3: An influence graph of a social network under the Linear Threshold model as introduced by Kempe et al. [1].
Fig. 4: Evolution of the set of influenced nodes under the Linear Threshold model.

Ii-B HILT Network model

Consider the population to be a social network of nodes where the graph is complete, and each edge carries the same weight. Also, let the thresholds be chosen from an arbitrary threshold distribution with cumulative density function (c.d.f.) . We call this the Homogeneous Influence Linear Threshold (HILT) network model. The influence matrix is given as follows: for all ,

and

Fig. 5: The HILT Network model

Carrying over the assumption in [1]’s model, we will assume that when dealing with uniform threshold distribution. This is because, under uniform distribution, the maximum threshold is , and it does not make sense to consider influences greater than . In Section VI, when discussing various threshold distributions with unbounded support, we show that this restriction can be removed.

Iii A Scaled Markov Chain and its Fluid Limit

Consider the HILT model on nodes, and with edge weights such that , and the threshold distribution at the nodes given by . In this section we will use Kurtz’s theorem [12] to obtain a two dimensional o.d.e. that can serve as a fluid approximation for the evolution of the stochastic processes in the HILT model.

Let and , respectively, be the sizes of the active and infectious sets at time . In the HILT model, due to homogeneity, the precise membership of these sets is irrelevant and it is sufficient to keep track of set sizes. Instead of , we will work with to distinguish the active nodes that have exercised their influence, and the infectious nodes. Recall that, is the size of the subset of active nodes that have exercised their influence by time , whereas is the size of the subset of active nodes at time that have not yet had a chance to exert their influence on the inactive nodes. It is easy to observe that is a discrete time Markov chain (DTMC) (see Appendix A). By definition,

Iii-a A Scaled Markov Chain

In order to obtain an approximating o.d.e., we need to work with an appropriately scaled Markov process , which can be thought of as evolving on a time scale times faster than that of the original system. We can visualize this process as evolving over “minislots” of duration , whereas the original process evolves at the epochs . Since this new process runs on a faster time scale, we need to slow down its dynamics. In each minislot, each node in decides to spread its influence with probability or defer with probability . In the former case, it contributes its influence of and then moves to the set , else it stays in set (see Figure 6). A similar scaling has been used in the context of the analysis of random multi-access algorithms by Bordenave et al. [17]. The reason for such a scaling is explained in Appendix B, where we contrast it with the traditional amplitude and time scaling. The evolution of this process can be written as follows:

Fig. 6: Evolution of the scaled process

where and are zero mean random variables.

Dividing the evolution equations by (the number of nodes in the network) and defining , , we can obtain the drifts for for the fraction of nodes in each state.

Let and denote the mean drifts of .

Consider the limiting drift function of and observe that,

Now consider and and define,

Theorem 1

Given the Markov process , we have for each and each ,

where is the unique solution to the ODE,

with initial conditions .

{proof}

This is essentially an instance of Kurtz’s theorem [12]; Also see [18]. In Appendix C we provide the statement of Kurtz’s theorem for our context, and the verify the necessary conditions to guarantee the convergence of the Markov processes to the fluid limit o.d.e. .

Remark: We know that the hazard function corresponding to the c.d.f. is given by

and hence the o.d.e. becomes,

(2)
(3)

Iii-B Accuracy of the o.d.e. approximation

Figure 7 shows the convergence of the scaled process to the o.d.e. with increasing network sizes and for and . We observe that for the o.d.e. approximates the scaled process fairly well.

As noted in Appendix B, the probabilistic scaling does not exactly replicate the original process. Hence, we have also compared the evolution of the original unscaled process for a fixed value of , with the o.d.e approximation. The results are shown in Figure 8, where multiple sample paths of the original process (obtained by using different random number seeds) are plotted along with the (deterministic) o.d.e. solution. We find that the o.d.e. solution approximates the mean evolution of the original process well.

Fig. 7: Trajectory of the fluid limit plotted along with samplepaths of the scaled process for .

Fig. 8: Trajectory of the fluid limit plotted along with multiple runs of the original process (normalized) for .

Iv Uniform Threshold Distribution

In this section, we will consider the o.d.e. approximation of the HILT process, under the uniform distribution of threshold. The hazard function for uniform distribution is given by and thus the system of o.d.e. becomes,

It turns out that we can explicitly solve the above system, thus yielding closed form expressions for . We will derive these closed form expressions and use these explicit expressions.

Iv-a Solution to the o.d.e.

On solving the o.d.e. for uniform distribution, with initial conditions and defining , we get

In Appendix D we provide the steps involved in obtaining the solution. From the above equations we can state the following theorem:

Iv-B Terminal spread of influence

The following theorem results from a simple observation of the o.d.e.’s.

Theorem 2

Given that we start with fraction of nodes in the infectious set in an HILT network with parameter , then the final fraction of activated nodes will be where .

Remarks:

  • We might also be interested in the question of choosing the right which can give us the required , and we see that

  • We observe that, for large , as long as we cannot influence the entire population (i.e., ) unless we start off with the entire population active (i.e., ). But if then provided .

Consider the discrete influence process (the Kempe model [1]), and let be the expected size of the terminal set , starting with as the initial set in the network . Since all initial sets are equivalent in the HILT model, we will be interested in the influence of a set of size . Define , for all of size .

By using results from [19], we can show that,

The behaviour of as a function of and can be seen in Figure 9 (depicted by solid lines), for a network of 3000 nodes. We also superimpose the behavior of against (depicted by asterisks). We observe that there is an exact match, except for . For , as seen earlier, we know that as long as . This is however true only in the fluid limit, and hence the discrepancy for finite .

Fig. 9: versus for (shown by solid lines) and versus (shown by asterisks) for various values of .

Taking and , we can show that as , . See Appendix E for the proof. This provides another verification of the accuracy of the o.d.e. approximation for large .

V Time constrained optimization

While the analytical expression derived earlier for HILT gives only the expected size of the terminal set, the o.d.e. dynamics approximates the trajectory of influence evolution, for large . This can be useful, especially in problem settings where the time taken by the process for the spread of influence is also considered, in addition to the size of the initial set.

Theorem 3

Given the initial fraction of infected nodes in an HILT network with parameter , the time we have to wait to get at least () fraction of nodes active is given by,

where .

{proof}

Firstly, note that since , must be less than . Since we are observing the process at a finite time , is not zero. Hence, we should look at the value of and set it to . We get,

Rearranging terms,we get the expression for .

A more interesting question would be to determine the to be chosen so that by time we will have at least fraction of the nodes activated, in the HILT network with parameter . Unfortunately, we will not be able to get a closed form expression for this, and it can be solved numerically using the following fixed point equation.

We can use the iterative bisection method obtain the fixed point of the above equation. Let and . We know that that solves will lie in and that the solution is unique, since is a monotonic function in . We also know that for , and for , .

Under the above conditions, we find that the bisection method will converge to . This is shown as Algorithm 1. The method is illustrated in Figure 10 for parameters , , .

;
;
while 1 do
       ;
       if then
             ;
            
      else
             ;
            
      end if then
             break;
            
      end
end ;
Algorithm 1 Iterative Bisection method

Fig. 10: Evaluating by Iterative Bisection Method

The variation of with respect to the parameters , and can be seen in Figures 11,12,13.

In Figure 11, note that for , as expected, , i.e., since there is no social interaction (), our terminal spread of influence will be equal to the initial seeding. Also, note that as the target time is reduced, we require higher values of to achieve the same (for , ). Finally, for , asymptotically approaches for large . From Figure 12, we see that as increases, for a given , the required monotonically increases. Finally, Figure 13, shows that the behavior for is qualitatively similar to the one in Figure 9 depicting .

Fig. 11: Variation of across for various values of with

Fig. 12: Variation of across for various values of with

Fig. 13: Variation of across for various values of at

Vi Effect of the Threshold distribution

Recall that the evolution of influence is given by:

Note that the evolution depends on the distribution of threshold via its hazard function,

where and are the probability density and cumulative distribution functions of the threshold distribution, respectively. Hazard functions are widely used in failure/survival analysis. In this section, we will consider threshold distributions with different hazard function characteristics, and study the spread of influence.

As indicated earlier, the o.d.e. derived is valid for any , and in this Section, we will also consider cases when , while discussing threshold distributions with unbounded support. However, for uniform threshold distribution, we will restrict , since under this case , valid only for .

Vi-a Exponential distribution

Exponential distribution is widely used in scenarios where there is need for a constant hazard rate. This is also due to the fact that exponential distribution is the only memoryless continuous distribution. Consider the threshold distributed as exponential with parameter . We have

Thus we get . Plugging this in the o.d.e. expression we get,

Observe that the above system of o.d.e. is equivalent to the dynamics of an SIR (Susceptible-Infective-Recovered) epidemic, with infection rate and recovery rate [15]. The and processes respectively are equivalent to the Recovered and Infective processes of the SIR epidemic model. Thus we see that the under exponential distribution of threshold, the Linear Threshold model, in its fluid limit, is equivalent to a special case of the SIR model. This equivalence provides a hitherto undocumented link between influence spread models from viral marketing literature (Linear Threshold model) and a traditional epidemic model (SIR model).

Figures 14(a) and 14(b) compare the influence evolution under uniform and exponential distribution of threshold. Note that for the same mean threshold () and smaller value of (Figure 14(a)), exponential case yields a larger terminal influence spread. This is because, under the exponential distribution, there are more nodes with threshold close to zero. This also explains the steeper increase of for exponential distribution compared to the uniform distribution case. In fact, from the respective o.d.e.s it is clear that for uniform distribution, is half that of exponential distribution with the same mean.

But, for larger values of (Figure 14(b)), uniform distribution yields a larger terminal influence spread. This is because, in the uniform case, the thresholds are bounded above by , while in the exponential case, the support set for thresholds is unbounded. Thus, under the uniform distribution, as approaches , the terminal spread of influence approaches (as noted in Section IV-B).

(a) small regime
(b) large regime
Fig. 14: Comparison of influence spread between Uniform threshold distribution and Exponential threshold distribution with the same mean. We still use , since we are dealing with the uniform distribution

Vi-B Weibull distribution

Another distribution which is widely used in survival analysis is the Weibull distribution. The probability density function of a Weibull random variable is given by,

If the random variable is the time to failure, then under the Weibull distribution, the failure rate is proportional to a power of time. The hazard function is given by,

In the above expression is often referred to as the scale parameter and is referred to as the shape parameter. Figure 15(a) shows the probability density function of Weibull distribution for various values of . Note that for , there are significantly high number of users with higher values of threshold, i.e., less susceptible to the spread of influence. The hazard rate for Weibull distribution can be increasing, constant or decreasing depending on the value of . This is demonstrated in the Figure 15(b).

  • leads to decreasing hazard rate. This implies that nodes are less likely to become activated by an instantaneous influence, as the existing influence (which failed to activate the node) on them increases.

  • yields constant hazard rate, and in that case Weibull distribution is just the exponential distribution.

  • yields an increasing hazard rate, which implies nodes are more likely to become activated by an instantaneous influence, as the existing influence on them increases.

(a) Probability density function
(b) Hazard function
Fig. 15: Weibull distribution for different values of

The HILT o.d.e. under Weibull distribution of threshold can be written as follows:

Figures 16(a) and 16(b) demonstrate the evolution of the o.d.e under the Weibull distribution of threshold, for different values of in the small and large regimes for . For smaller (Figure 16(a)), we observe that as increases, the spread of influence decreases. This is expected, since from Figure 15(a) it is clear that, for larger values of , Weibull distribution puts more mass on larger values of threshold, i.e., nodes are less susceptible to influence. Further, for , Figure 16(a) shows that the total spread of influence is , equal to the initial seeding . This implies the influence does not spread at all, since the node thresholds are much higher, compared to the net influence generated by (due to smaller ).

For larger (Figure 16(b)), we see that the trend is reversed, i.e., as increases, the spread of influence increases. It is to be noted that the near is larger for smaller , similar to the small regime. However, from Figures 15(a) and 15(b) we see that smaller values of have heavier tails (and lower hazard rates), thus leading to stagnation of influence after the initial surge.

Another interesting feature to note is that, unlike the small regime, for , we get a much higher influence spread. Also, unlike other values of , here exhibits a non-monotonic behavior even after it begins to decrease, i.e., is not unimodal. Such behavior has not been observed until now in the classic epidemiology framework, especially in a homogeneous setting. In traditional epidemic models like SIR, the I process (equivalent to ) might exhibit an initial increase, but once it begins decreasing, continues to steadily decrease to zero. But, in our dynamics, the presence of hazard rate (increasing, in this case) leads to such non-unimodal characteristics of .

(a) small regime
(b) large regime
Fig. 16: Comparison of influence spread between Weibull threshold distributions with different values of

Vi-C Loglogistic distribution

Loglogistic distribution is the probability distribution of a random variable who logarithm follows the logistic distribution. It has similar shape characteristics to log-normal distribution, but has heavier tails. The probability density function and the hazard function are given by,

The parameter functions as the scale parameter and is referred to as the shape parameter. Also for , the distribution is unimodal, and is more concentrated as increases (see Figure 17(a)).

Similar to the Weibull distribution, one can obtain different failure characteristics by tuning the parameter. For , the hazard rate decreases monotonically. But unlike the Weibull distribution, for , the hazard function exhibits non-monotonic behavior (see Figure 17(b)).

(a) Probability density function
(b) Hazard function
Fig. 17: loglogistic distribution for different values of

The HILT o.d.e. under loglogistic distribution of threshold can be written as follows:

Figures 18(a) and 18(b) demonstrate the evolution of the influence spread o.d.e. under the loglogistic distribution of threshold, for different values of in the small and large regimes for . We note that for small , the evolution of influence is qualitatively similar, but under the loglogistic distribution, we get a smaller influence spread, due to heavier tails. Also, in the large regime, we note that for , we again get a non-unimodal behavior for . But the second peak is less pronounced in the loglogistic distribution than the Weibull distribution, since the loglogistic distribution exhibits a non-monotonic hazard rate.

(a) small regime
(b) large regime
Fig. 18: Comparison of influence spread between loglogistic threshold distributions with different values of

Thus we see that, the incorporation of hazard rate into the o.d.e. (resulting from a fluid limit characterization of the LT model) yields qualitatively different characteristics compared to the standard epidemic models. To the best of our knowledge, this is the first work that analytically characterizes the evolution of influence under different threshold distributions. This is also the first work to incorporate hazard functions into the epidemic models, thus providing a way to capture heterogeneity in the population. Further, due to the one-one correspondence between a given hazard function and its corresponding cumulative distribution [13], one can begin with the hazard function in the o.d.e. (obtained by curve-fitting to existing epidemic data) and ascertain the threshold distribution of the population.

Vii Multiclass HILT model

A natural extension to the HILT model would be to consider the evolution of information spread in an heterogeneous network. Such a scenario might arise in a network with communities, where the interactions within a community might be stronger than the interaction across communities. These have been traditionally studied under the term stratified epidemics [20]. Consider a network with communities and let denote the number of nodes in each community. Let be the influence matrix, whose entries indicates the strength of influence from community to community (see Figure 19).

Fig. 19: A heterogeneous network with three communities, shown with entries of the influence matrix . Nodes , and belong to communities , and , and have their thresholds distributed according to , and respectively.

As earlier, we will appropriately normalize the edge weights, i.e., for and , , where is the total population size. Let the nodes within community have their thresholds distributed according to , with hazard function . We can then carry out an analysis similar to what was done for the HILT model in Section III. We can show that the joint evolution is a Markov process, and we construct a scaled process using the minislots approach and with appropriate probability scaling. Here the attempt probability of all infectious nodes during a given mini-slot scales as , irrespective of which community they belong to. By applying Kurtz’s theorem to the scaled process, we obtain the o.d.e.s representing the influence evolution. Let denote the non-infectious and infectious active nodes within community . We can then describe their evolution by the following system of o.d.e.s similar to Equations 2 and 3 ():

where , and . One possible objective function to maximize in this scenario would be the total spread of influence , by suitably choosing the initial subject to the constraint , for fixed system parameters, i.e., the threshold distributions and the influence matrix . We were unable to obtain a universal analytical solution for this problem, but numerically demonstrate that depending the system parameters the results could be quite counter-intuitive.

Fig. 20: Evolution of influence in the two communities when the initial seeding is done in the smaller community (i.e., )

Fig. 21: Evolution of influence in the two communities when the initial seeding is done in the larger community (i.e., )

For instance, consider a two community network with and as the relative community sizes. Let all the nodes in the population have their thresholds distributed according to an exponential distribution with parameter . Also assume that and for . Figures 20 and 21 show the evolution of for , for different initial conditions. While in Figure 20 (scenario 1) the entire initial seeding is done in the smaller community (i.e., ), in Figure 21 (scenario 2) the entire initial seeding is done in the larger community. We see that, the total spread of influence in scenario 1 is larger than in scenario 2. Further from Figure 22 it is clear that the optimal seeding for this setting is approximately . It is surprising that we get a wider spread of influence by investing more in the smaller community. Thus we see that, even in a simple two community setting, the optimal seeding might be counter-intuitive. It would be an interesting future direction to analytically obtain the optimal seeding, given the influence matrix and the threshold distributions .

Fig. 22: Total spread of influence for various allocations of initial seeding and different relative community sizes.

Viii Conclusion

In this work, we began with a homogeneous version of the Linear Threshold model proposed by Kempe et al. [1] in the context of viral marketing, and generalized it for arbitrary threshold distributions. We observed that the spread of influence evolves as a discrete time Markov chain. Under a certain scaling, we showed that the scaled Markov chain converges (in the sense of [12]) to a deterministic trajectory defined by an o.d.e.. The threshold distribution appears in terms of its hazard rate function in this o.d.e. We described how this approach complements the fixed point equation suggested by Granovetter [3], thus providing a link between two threads in the threshold model literature. Also, under the exponential distribution of threshold, we showed that the derived fluid dynamics are equivalent to the well-known SIR model in epidemiology. We also numerically demonstrated how incorporating the hazard function into the o.d.e. can provide qualitatively different characteristics compared to traditional epidemic models, even in a homogeneous setting. One of the interesting future directions is to incorporate the degree distribution of the underlying network in the fluid dynamics. Further, one can carry out a similar analysis for influence processes with a general threshold function (instead of linear), as indicated in [1]. Also, using the available social network data and via controlled experiments, one could validate or suggest improvements to the threshold model, in order to fit the real world dynamics.

References

  • [1] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In ACM SIGKDD, 2003.
  • [2] David J Bartholomew. Stochastic models for social processes. Wiley New York, 1967.
  • [3] Mark Granovetter. Threshold models of collective behaviour. American Journal of Sociology, 1978.
  • [4] E. M. Rogers. Diffusion of innovations. New York: Free Press, 1962.
  • [5] Thomas W Valente. Social network thresholds in the diffusion of innovations. Social networks, 18(1):69–89, 1996.
  • [6] Norman TJ Bailey. The mathematical theory of infectious diseases and its applications. Charles Griffin, 1975.
  • [7] Renato E Mirollo and Steven H Strogatz. Synchronization of pulse-coupled biological oscillators. SIAM Journal on Applied Mathematics, 50(6):1645–1662, 1990.
  • [8] Liangjie Hong, Ovidiu Dan, and Brian D Davison. Predicting popular messages in twitter. In Proceedings of the 20th international conference companion on World wide web, pages 57–58. ACM, 2011.
  • [9] Gabor Szabo and Bernardo A Huberman. Predicting the popularity of online content. Communications of the ACM, 53(8):80–88, 2010.
  • [10] Pedro Domingos and Matt Richardson. Mining the network value of customers. In ACM SIGKDD, 2001.
  • [11] Eitan Altman, Francesco De Pellegrini, Rachid El-Azouzi, Daniele Miorandi, and Tania Jimenez. Emergence of equilibria from individual strategies in online content diffusion. In Fifth International Workshop on Network Science for Communication Networks (NetSciCom), 2013.
  • [12] Thomas G. Kurtz. Solutions of ordinary differential equations as limits of pure jump markov processes, 1970.
  • [13] D. R. Cox. Renewal Theory. Metheun & Co. Ltd. Science Paperbacks, 1961.
  • [14] JM Morton, JN Dups, ND Anthony, and JF Dwyer. Epidemic curve and hazard function for occurrence of clinical equine influenza in a closed population of horses at a 3-day event in southern queensland, australia, 2007. Australian veterinary Journal, 89(s1):86–88, 2011.
  • [15] D. J. Daley and J. Gani. Epidemic Modelling: An Introduction. Cambridge University Press, 2001.
  • [16] Roger V Gould. Collective action and network structure. American Sociological Review, pages 182–196, 1993.
  • [17] A. Proutiere C. Bordenave, D. McDonald. Random multi-access algorithms, a mean field analysis. In 43rd Allerton Conference, 2005.
  • [18] R.W.R. Darling. Fluid limits of pure jump markov processes: A practical guide. Available at arxiv.org/pdf/math/0210109, 2002.
  • [19] Srinivasan Venkatramanan and Anurag Kumar. Information dissemination in socially aware networks under the linear threshold model. In Communications (NCC), 2011 National Conference on, pages 1–5. IEEE, 2011.
  • [20] RK Watson. On an epidemic in a stratified population. Journal of Applied Probability, pages 659–666, 1972.

Appendix A Dtmc

Let denote the entire history of the processes up to time , i.e., . To obtain the expected drift of , consider,

Let and , then we have in the HILT model and . Hence we can write,

From the above equations we can clearly see that is a DTMC on the state space .

Appendix B Scaling the HILT Model

In this section, we will demonstrate the necessity for a probabilistic scaling (in addition to the amplitude and time scaling) to arrive at the mean drift expressions. Let denote the entire history of the processes up to time , i.e., . Begin with the drift equations for the unscaled process .

We shall now try scaling the process in the usual way, i.e.,by scaling down the amplitude by a factor of , , . The evolution equations can then be written down as follows:

Using , and , we can write the drift function as,

where both and are fractions taking values from . It is clear that diverges with but we want this quantity to converge to a function (which is independent of ) so that we can apply Kurtz’s theorem to obtain an approximating ODE. We can see that the problem in the above case is caused because the drift function in the original process scales with the state. Hence in this case, while scaling, we need to slow down the process by another factor of . To this purpose, we use the probabilistic attempt model in our scaling. The same scaling has been used in the literature in the context of the analysis of Random Multi-Access Algorithms by Bordenave et al. [17]. Note that this modifies the dynamics of the original process. The o.d.e. will be the limit (in probability) of the stochastic process with modified dynamics as but will be a heuristic approximation for the stochastic process with the original dynamics.

Appendix C Proof of Theorem 1

Kurtz’s theorem [12] provides us a way by which we can approximate the evolution of a pure jump Markov process by the solution of a derived ODE. In this paper we shall refer to [18] for an equivalent version of Kurtz’s theorem, which is simpler to handle. It can be restated as follows to be directly used in our context.

Theorem 4

Given that,

  • is Lipschitz

  • where , with .

  • and where
    is the history of the process upto time .

  • and

then we have for each and each ,

where and are defined as the solutions of the system of ODE,

with initial conditions .

  • Lipschitz property
    Consider,

    We see that each of the terms above is bounded when . Thus the norm of Jacobian is uniformly bounded, and it follows that is Lipschitz.

  • Uniform Convergence

    By definition, and hence the uniform convergence of to in the domain is straightforward.

  • Bounded Noise variance
    We can write the noise variances as follows: