Conjoining Speeds up Information Diffusion in Overlaying Social-Physical Networks

Conjoining Speeds up Information Diffusion in Overlaying Social-Physical Networks

Abstract

We study the diffusion of information in an overlaying social-physical network. Specifically, we consider the following set-up: There is a physical information network where information spreads amongst people through conventional communication media (e.g., face-to-face communication, phone calls), and conjoint to this physical network, there are online social networks where information spreads via web sites such as Facebook, Twitter, FriendFeed, YouTube, etc. We quantify the size and the critical threshold of information epidemics in this conjoint social-physical network by assuming that information diffuses according to the SIR epidemic model. One interesting finding is that even if there is no percolation in the individual networks, percolation (i.e., information epidemics) can take place in the conjoint social-physical network. We also show, both analytically and experimentally, that the fraction of individuals who receive an item of information (started from an arbitrary node) is significantly larger in the conjoint social-physical network case, as compared to the case where the networks are disjoint. These findings reveal that conjoining the physical network with online social networks can have a dramatic impact on the speed and scale of information diffusion.

Key words: Information Diffusion, Coupled Social Networks, Percolation Theory, Random Graphs.

1 Introduction

1.1 Motivation

Modern society relies on basic physical network infrastructures, such as power stations, telecommunication networks and transportation systems. Recently, due to advances in communication technologies and cyber-physical systems, these infrastructures have become increasingly dependent on one another and have emerged as interdependent networks [1]. One archetypal example of such coupled systems is the smart grid where the power stations and the communication network controlling them are coupled together. See the pioneering work of Buldyrev et al. [2] as well as [3, 4, 5, 6] for a diverse set of models on coupled networks.

Apart from physical infrastructure networks, coupling can also be observed between different types of social networks. Traditionally, people are tied together in a physical information network through old-fashioned communication media, such as face-to-face interactions. On the other hand, recent advances of Internet and mobile communication technologies have enabled people to be connected more closely through online social networks. Indeed, people can now interact through e-mail or online chatting, or communicate through a Web website such as Facebook, Twitter, FriendFeed, YouTube, etc. Clearly, the physical information network and online social networks are not completely separate since people may participate in two or more of these networks at the same time. For instance, a person can forward a message to his/her online friends via Facebook and Twitter upon receiving it from someone through face-to-face communication. As a result, the information spread in one network may trigger the propagation in another network, and may result in a possible cascade of information. One conjecture is that due to this coupling between the physical and online social networks, today’s breaking news (and information in general) can spread at an unprecedented speed throughout the population, and this is the main subject of the current study.

Information cascades over coupled networks can deeply influence the patterns of social behavior. Indeed, people have become increasingly aware of the fundamental role of the coupled social-physical network1 as a medium for the spread of not only information, but also ideas and influence. Twitter has emerged as an ultra-fast source of news [7] and Facebook has attracted major businesses and politicians for advertising products or candidates. Several music groups or singers have gained international fame by uploading videos to YouTube. In almost all cases, a new video uploaded to YouTube, a rumor started in Facebook or Twitter, or a political movement advertised through online social networks, either dies out quickly or reaches a significant proportion of the population. In order to fully understand the extent to which these events happen, it is of great interest to consider the combined behavior of the physical information network and online social networks.

1.2 Related Work

Despite the fact that information diffusion has received a great deal of research interest from various disciplines for over a decade, there has been little study on the analysis of information diffusion across coupled networks; most of the works consider information propagation only within a single network. The existing literature on this topic is much too broad to survey here, but we will attempt to cover the works that are most relevant to our study. To this end, existing studies can be roughly classified into two categories. The first type of studies [8, 9, 10, 11, 12, 13] are empirical and analyze various aspects of information diffusion using large-scale datasets from existing online social networks. Some of the interesting questions that have been raised (and answered) in these references include “What are the roles of behavioral properties of the individuals and the strength of their ties in the dynamics of information diffusion” [10, 11], “How do blogs influence each other?” [13], and “How does the topology of the underlying social network effect the spread of information?” [11].

The second type of studies [14, 15, 16, 17, 18, 19, 20] build mathematical models to analyze the mechanisms by which information diffuses across the population. These references study the spread of diseases (rather than information) in small-world networks [17, 18], scale-free networks [19], and networks with arbitrary degree distributions [20]. However, by the well-known analogy between the spread of diseases and information [21, 22, 23], their results also apply in the context of information diffusion. Another notable work in this group is [24] which studies the spread of rumors in a network with multiple communities.

Setting aside the information diffusion problem, there has been some recent interest on various properties of coupled (or interacting or layered) networks (see [3, 25, 26, 27, 28]). For instance, [3], [25] and [26] consider a layered network structure where the networks in distinct layers are composed of identical nodes. On the other hand, in [27], the authors studied the percolation problem in two interacting networks with completely disjoint vertex sets; their model is similar to interdependent networks introduced in [2]. Recently, [28] studied the susceptible-infectious-susceptible (SIS) epidemic model in an interdependent network.

1.3 Summary of Main Contributions

The current paper belongs to the second type of studies introduced above and aims to develop a new theoretic framework towards understanding the characteristics of information diffusion across multiple coupled networks. Although empirical studies are valuable in their own right, the modeling approach adopted here reveals subtle relations between the network parameters and the dynamics of information diffusion, thereby allowing us to develop a fundamental understanding as to how conjoining multiple networks extends the scale of information diffusion. The interested reader is also referred to the article by Epstein [29] which discusses many benefits of building and studying mathematical models; see also [30].

For illustration purposes, we give the definitions of our model in the context of an overlaying social-physical network. Specifically, there is a physical information network where information spreads amongst people through conventional communication media (e.g., face-to-face communication, phone calls), and conjoint to this physical network, there are online social networks offering alternative platforms for information diffusion, such as Facebook, Twitter, YouTube, etc. In the interest of easy exposition, we focus on the case where there exists only one online social network along with the physical information network; see the Appendix for an extension to the multiple social networks case. We model the physical network and the social network as random graphs with specified degree distributions [31]. We assume that each individual in a population of size is a member of the physical network, and becomes a member of the social network independently with a certain probability. It is also assumed that information is transmitted between two nodes (that are connected by a link in any one of the graphs) according to the susceptible-infectious-recovered (SIR) model; see Section 2 for precise definitions.

Our main findings can be outlined as follows: We show that the overlaying social-physical network exhibits a “critical point” above which information epidemics are possible; i.e., a single node can spread an item of information (a rumor, an advertisement, a video, etc.) to a positive fraction of individuals in the asymptotic limit. Below this critical threshold, only small information outbreaks can occur and the fraction of informed individuals always tends to zero. We quantify the aforementioned critical point in terms of the degree distributions of the networks and the fraction of individuals that are members of the online social network. Further, we compute the probability that an information originating from an arbitrary individual will yield an epidemic along with the resulting fraction of individuals that are informed. Finally, in the cases where the fraction of informed individuals tend to zero (non-epidemic state), we compute the expected number of individuals that receive an information started from a single arbitrary node.

These results are obtained by mapping the information diffusion process to an equivalent bond percolation problem [32] in the conjoint social-physical network, and then analyzing the phase transition properties of the corresponding random graph model. This problem is intricate since the relevant random graph model corresponds to a union of coupled random graphs, and the results obtained in [20, 31] for single networks fall short of characterizing its phase transition properties. To overcome these difficulties, we introduce a multi-type branching process and analyze it through an appropriate extension of the method of generating functions [20].

To validate our analytical results, we also perform extensive simulation experiments on synthetic networks that exhibit similar characteristics to some real-world networks. In particular, we verify our analysis on networks with power-law degree distributions with exponential cut-off and on Erdős-Rényi (ER) networks [33]; it has been shown [34] that many real networks, including the Internet, exhibit power-law distributions with exponential cut-off. We show that conjoining the networks can significantly increase the scale of information diffusion even with only one social network. To give a simple example, consider a physical information network and an online social network that are ER graphs with respective mean degrees and , and assume that each node in is a member of independently with probability . If and , we show that information epidemics are possible in the overlaying social-physical network whenever . In stark contrast, this happens only if or when the two networks are disjoint. Furthermore, in a single ER network with , an information item originating from an arbitrary individual gives rise to an epidemic with probability (i.e., can reach at most of the individuals). However, if the same network is conjoined with an ER network with and , the probability of an epidemic becomes (indicating that up to of the population can be influenced). These results show that the conjoint social-physical network can spread an item of information to a significantly larger fraction of the population as compared to the case where the two networks are disjoint.

The above conclusions are predicated on the social network containing a positive fraction of the population. This assumption is indeed realistic since more than 50% of the adult population in the US use Facebook [11]. However, for completeness we also analyze (see Section 5) the case where the social network contains only nodes with . In that case, we show analytically that no matter how connected is, conjoining it to the physical network does not change the threshold and the expected size of information epidemics.

Our results provide a complete characterization of the information diffusion process in a coupled social-physical network, by revealing the relation between the network parameters and the most interesting quantities including the critical threshold, probability and expected size of information epidemics. To the best of our knowledge, there has been no work in the literature that studies the information diffusion in overlay networks whose vertices are neither identical nor disjoint. We believe that our findings along this line shed light on the understanding on information propagation across coupled social-physical networks.

1.4 Notation and Conventions

All limiting statements, including asymptotic equivalences, are understood with going to infinity. The random variables (rvs) under consideration are all defined on the same probability space . Probabilistic statements are made with respect to this probability measure , and we denote the corresponding expectation operator by . The mean value of a random variable is denoted by . We use the notation to indicate distributional equality, to indicate almost sure convergence and to indicate convergence in probability. For any discrete set we write for its cardinality. For a random graph we write for the number of nodes in its th largest connected component; i.e., stands for the size of the largest component, for the size of the second largest component, etc.

The indicator function of an event is denoted by . We say that an event holds with high probability (whp) if it holds with probability as . For sequences , we write as a shorthand for the relation , whereas means that there exists such that for all sufficiently large. Also, we have if , or equivalently, if there exists such that for all sufficiently large. Finally, we write if we have and at the same time.

1.5 Organization of the Paper

The rest of the paper is organized as follows. In Section 2, we introduce a model for the overlaying social-physical network. Section 3 summarizes the main results of the paper that deal with the critical point and the size of information epidemics. In section 4, we illustrate the theoretical findings of the paper with numerical results and verify them via extensive simulations. In Section 5, we study information diffusion in an interesting case where only a sublinear fraction of individuals are members of the online social network. The proofs of the main results are provided in Sections 6 and 7. In the Appendix, we demonsrate an extension of the main results to the case where there are multiple online social networks.

2 System Model

2.1 Overlay Network Model

We consider the following model for an overlaying social-physical network. Let stand for the physical information network of human beings on the node set . Next, let stand for an online social networking web site, e.g., Facebook. We assume that each node in is a member of this auxiliary network with probability independently from any other node. In other words, we let

(1)

with denoting the set of human beings that are members of Facebook. With this assumption, it is clear that the vertex set of satisfies

(2)

by the law of large numbers (we consider the case where separately in Section 5).

We define the structure of the networks and through their respective degree distributions and . In particular, we specify a degree distribution that gives the properly normalized probabilities that an arbitrary node in has degree . Then, we let each node in have a random degree drawn from the distribution independently from any other node. Similarly, we assume that the degrees of all nodes in are drawn independently from the distribution . This corresponds to generating both networks (independently) according to the configuration model [33, 35]. In what follows, we shall assume that the degree distributions are well-behaved in the sense that all moments of arbitrary order are finite.

In order to study information diffusion amongst human beings, a key step is to characterize an overlay network that is constructed by taking the union of and . In other words, for any distinct pair of nodes , we say that and are adjacent in the network , denoted , as long as at least one of the conditions {} or {} holds. This is intuitive since a node can forward information to another node either by using old-fashioned communication channels (i.e., links in ) or by using Facebook (i.e., links in ). Of course, for the latter to be possible, both and should be Facebook users.

The overlay network constitutes an ensemble of the colored degree-driven random graphs proposed in [36]. Let be the space of possible colors (or types) of edges in ; specifically, we say the edges in Facebook are of type , while the edges in the physical network are said to be of type . The colored degree of a node is then represented by an integer vector , where (resp. ) stands for the number of Facebook edges (resp. physical connections) that are incident on node . Under the given assumptions on the degree distributions of and , the colored degrees (i.e., ) will be independent and identically distributed according to a colored degree distribution such that

(3)

due to independence of and . The term accommodates the possibility that a node is not a member of the online social network, in which case the number of -edges is automatically zero.

Given that the colored degrees are picked such that and are even, we construct as in [36, 20]: Each node is first given the appropriate number and of stubs of type and type , respectively. Then, pairs of these stubs that are of the same type are picked randomly and connected together to form complete edges; clearly, two stubs can be connected together only if they are of the same type. Pairing of stubs continues until none is left.

2.2 Information Propagation Model

Now, consider the diffusion of a piece of information in the overlay network which starts from a single node. We assume that information spreads from a node to its neighbors according to the SIR epidemic model. In this context, an individual is either susceptible (S) meaning that she has not yet received a particular item of information, or infectious (I) meaning that she is aware of the information and is capable of spreading it to her contacts, or recovered (R) meaning that she is no longer spreading the information. This analogy between the spread of diseases and spread of information in a network has long been recognized [21] and SIR epidemic model is commonly used in similar studies; e.g., see [22] (diffusion of worms in online social networks), [21] (diffusion of information through Blogs), and [23] (diffusion of files in peer-to-peer file sharing networks), among others.

The dynamics of information diffusion can now be described as in [20]: We assume that an infectious individual transmits the information to a susceptible contact with probability where

Here, denotes the average rate of being in contact over the link from to , and is the time keeps spreading the information; i.e., the time it takes for to become recovered.

It is expected that the information propagates over the physical and social networks at different speeds, which manifests from different probabilities across links in this case. Specifically, let stand for the probability of information transmission over a link (between and and ) in and let denote the probability of information transmission over a link in . For simplicity, we assume that and are independent for all distinct pairs . Furthermore, we assume that the random variables and are independent and identically distributed (i.i.d.) with probability densities and , respectively. In that case, it was shown in [20, 37] that information propagates over as if all transmission probabilities were equal to , where is the mean value of ; i.e.,

We refer to as the transmissibility of the information over the physical network and note that . In the same manner, we assume that and are i.i.d. with respective densities and leading to a transmissibility of information over the online social network .

Under these assumptions, information diffusion becomes equivalent to the bond percolation on the conjoint network [20, 37]. More specifically, assume that each edge in (resp. ) is occupied – meaning that it can be used in spreading the information – with probability (resp. ) independently from all other edges. Then, the size of an information outbreak started from an arbitrary node is equal to the number of individuals that can be reached from that initial node by using only the occupied links of . Hence, the threshold and the size of information epidemics can be computed by studying the phase transition properties of the random graph which is obtained by taking a union of the occupied edges of and . More precisely, information epidemics can take place if and only if has a giant connected component that contains a positive fraction of nodes in the large limit. Also, an arbitrary node can trigger an information epidemic only if it belongs to the giant component, in which case an information started from that node will reach to all nodes in the giant component. Hence, the fractional size of the giant component in gives both the probability that an arbitrary node triggers an information epidemic as well as the corresponding fractional size of the information epidemic.

3 Main Results

3.1 Information Diffusion in Coupled Graphs with Arbitrary Degree Distributions

We now present the main result of our paper characterizes the threshold and the size of the information epidemic in by revealing its phase transition properties. First, for notational convenience, let and be random variables independently drawn from the distributions and , respectively, and let and . Further, assume that and are given by

(4)

and define the threshold function by

(5)

Finally, let in be given by the pointwise smallest solution of the recursive equations

(6)
(7)
Theorem 3.1

Under the assumptions just stated, we have

  • If then with high probability the size of the largest component satisfies . On the other hand, if , then whp.

  • Also,

    (8)

A proof of Theorem 3.1 is given in Section 6.

Theorem 3.1 quantifies the fraction of individuals in the overlaying social-physical network that are likely to receive an item of information which starts spreading from a single individual. Specifically, Theorem 3.1 shows that the critical point of the information epidemic is marked by , with the critical threshold given by (5). In other words, for any parameter set that yields (supercritical regime), an item of information has a positive probability of giving rise to an information epidemic; i.e., reaching a linear fraction of the individuals. In that case, the probability of a node triggering an information epidemic, and the corresponding asymptotic fraction of individuals who receive the information can be found by first solving the recursive equations (6)-(7) for the smallest in and then computing the expression given in (8). On the other hand, whenever it holds that (subcritical regime), we conclude from Theorem 3.1 that the number of individuals who receive the information will be with high probability, meaning that all information outbreaks are non-epidemic.

It is of interest to state whether or not Theorem 3.1 can be deduced from the phase transition results for random graphs with arbitrary degree distributions (e.g., see [35, 20, 31]). It is well known [35] that for these graphs the critical point of the phase transition is given by

where is the degree of an arbitrary node. We next show that this condition is not equivalent (and, indeed is not even a good approximation) to .

To this end, we consider a basic scenario where and are both Erdős-Rényi graphs [33] so that their degree distributions are (asymptotically) Poisson, i.e., we have and . Given that each link in (resp. in ) is occupied with probability (resp. ), the occupied degree of an arbitrary node in follows a Poisson distribution with mean if (which happens with probability ), and it follows a Poisson distribution with mean if (which happens with probability ). When becomes large this leads to

(9)

It can be seen that the above expression is not equal to the corresponding quantity – As discussed in the next subsection, for the given degree distributions we have , where is given by (14). For instance, with , and , we have while (9) yields signaling a significant difference between the exact threshold and the approximation given by (9). We conclude that the results established in Theorem 3.1 (for coupled random graphs) go beyond the classical results for single random graphs with arbitrary degree distributions.

Aside from the critical threshold and the fractional size of information epidemics, we are also interested in computing the average size of information outbreaks in the subcritical regime for a fuller understanding of information propagation process. In other words, in the case where the fraction of informed individuals tends to zero, we wish to compute the expected number of informed nodes. For a given network with nodes , the average outbreak size is given by , where is the number of nodes that receive an information started from node ; i.e., is the size of the largest connected component containing node .

Now, let denote the average outbreak size in . It is easy to check that

(10)

where, as before, gives the size of the th largest component of the network, and denotes the total number of components. To see (10), observe that an arbitrarily selected node will belong to a component of size with probability , in which case an information started from that particular node will create an outbreak of size . Summing over all components of the network, we get (10). In the supercritical regime, we have so that . The next result, established in Section 6, allows computing this quantity in the subcritical regime.

Theorem 3.2

Let . With the above assumptions, let , denote the simultaneous stable solution of the equations

(11)
(12)

Then, the average outbreak size satisfies

(13)

3.2 Special Case: Information Diffusion in coupled ER graphs

A special case of interest is when both and are Erdős-Rényi graphs [33]. More specifically, let be an ER network on the vertices such that there exists an edge between any pair of distinct nodes with probability ; this ensures that mean degree of each node is asymptotically equal to . Next, obtain a set of vertices by picking each node independently with probability . Now, let be an ER graph on the vertex set with edge probability given by . The mean degree of a node in is given (asymptotically) by as seen via (2).

Given that the degree distributions are asymptotically Poisson in ER graphs, this special case is covered by our model presented in Section 2.1 by setting and . Thus, Theorem 3.1 is still valid and can be used to obtain the condition and expected size of information epidemics. However, recent developments on inhomogeneous random graphs [38] enable us to obtain more detailed results than those given by Theorem 3.1 for this special case.

Consider now an overlay network model constructed on the vertices by conjoining the occupied edges of and , i.e., we have . Let be defined by

(14)

Also, let be the pointwise largest solution of the recursive equations

(15)

with in .

Theorem 3.3

With the above assumptions, we have

  • If , then with high probability, the size of the largest component satisfies ; in contrast, if we have whp, while the size of the second largest component satisfies .

  • Moreover,

A proof of Theorem 3.3 is given in Section 7

Theorem 3.3 is a counter-part of Theorem 3.1. This time, the “critical point” of the information epidemic is marked by , with the critical threshold given by (14). With and , we have that , , and it is easy to check that so that part of Theorem 3.3 is compatible with part of Theorem 3.1. Also, we find (numerically) that the second parts of Theorems 3.3 and 3.1 yield the same asymptotic giant component size. Nevertheless, it is worth noting that Theorem 3.3 is not a corollary of Theorem 3.1. This is because, through a different technique used in the proofs, Theorem 3.3 provides the sharper bounds (subcritical case) and (supercritical case) that go beyond Theorem 3.1.

Figure 1: The minimum required for existence of a giant component in versus for various values. In other words, each curve corresponds to the boundary of the phase transition for the corresponding value. Above the boundary there exists a giant component, but below it all components have nodes.

We observe that the threshold function is symmetric in and , meaning that both networks have identical roles in carrying the conjoined network to the supercritical regime where information can reach a linear fraction of the nodes. To get a more concrete sense, we depict in Figure 1 the minimum required to have a giant component in versus for various values. Each curve in the figure corresponds to a phase transition boundary above which information epidemics are possible. If , the same plot shows the boundary of the giant component existence with respect to the mean degrees and . This clearly shows how two networks that are in the subcritical regime can yield an information epidemic when they are conjoined. For instance, we see that for , it suffices to have for the existence of an information epidemic. Yet, if the two networks were disjoint, it would be necessary [33] to have and .

We elaborate further on Theorem 3.3. First, we note from the classical results [33] that ER graphs have a giant component whenever average node degree exceeds one. This is compatible with part of Theorem 3.3, since the condition for giant component existence reduces to if and when . Finally, in the case where (i.e., when everyone in the population is a member of Facebook), the graph reduces to an ER graph with edge probability leading to a mean node degree of in the asymptotic regime. As expected, for the case , Theorem 3.3 reduces to classical results for ER graphs as we see that and where is the largest solution of

4 Numerical Results

4.1 ER Networks

We first study the case where both the physical information network and the online social network are Erdős-Rényi graphs. As in Section 3.2, let be the conjoint social-physical network, where is defined on the vertices , whereas the vertex set of is obtained by picking each node independently with probability . The information transmissibilities are equal to and in and , respectively, so that the mean degrees are given (asymptotically) by and , respectively.

We plot in Figure 2 the fractional size of the giant component in versus for various values. In other words, the plots illustrate the largest fraction of individuals that a particular item of information can reach. In this figure, the curves stand for the analytical results obtained by Theorem 3.3 whereas marked points stand for the experimental results obtained with nodes by averaging experiments for each data point. Clearly, there is an excellent match between the theoretical and experimental results. It is also seen that the critical threshold for the existence of a giant component (i.e., an information epidemic) is given by when , when , and when . It is easy to check that these values are in perfect agreement with the theoretically obtained critical threshold given by (14).

In the inset of Figure 2, we demonstrate the average outbreak size versus under the same setting. Namely, the curves stand for the analytical results obtained from Theorem 3.2, while the marked points are obtained by averaging the quantity given in (10) over 200 independent experiments. We see that experimental results are in excellent agreement with our analytical results. Also, as expected, average outbreak size is seen to grow unboundedly as approaches to the corresponding epidemic threshold.

Figure 2: The fractional size of the giant component in versus . The curves correspond to analytical results obtained from Theorem 3.3, whereas marked points stand for the experimental results obtained with by averaging experiments for each point. (Inset) Average out-break size versus under the same setting.

4.2 Networks with Power Degree Distributions

In order to gain more insight about the consequences of Theorem 3.1 for real-world networks, we now consider a specific example of information diffusion when the physical information network and the online social network have power-law degree distributions with exponential cutoff. Specifically, we let

(16)

and

(17)

where , , and are positive constants and the normalizing constant is the th polylogarithm of ; i.e.,

Power law distributions with exponential cutoff are chosen here because they are applied to a variety of real-world networks [20, 27]. In fact, a detailed empirical study on the degree distributions of real-world networks [34] revealed that the Internet (at the level of autonomous systems), the phone call network, the e-mail network, and the web link network all exhibit power law degree distributions with exponential cutoff.

To apply Theorem 3.1, we first compute the epidemic threshold given by (5). Under (16)-(17) we find that

Similar expressions can be derived for and . It is now a simple matter to compute the critical threshold from (5) using the above relations. Then, we can use Theorem 3.1(i) to check whether or not an item of information can reach a linear fraction of individuals in the conjoint social-physical network .

To that end, we depict in Figure 3 the minimum value required to have a giant component in versus , for various values. In other words, each curve corresponds to a phase transition boundary above which information epidemics are possible, in the sense that an information has a positive probability of reaching out to a linear fraction of individuals in the overlaying social-physical network. In all plots, we set and . The and values are multiplied by the corresponding and values to make a fair comparison with the disjoint network case where it is required [20] to have (or ) for the existence of an epidemic; under the current setting we have . Figure 3 illustrates how conjoining two networks can speed up the information diffusion. It can be seen that even for small values, two networks, albeit having no giant component individually, can yield an information epidemic when they are conjoined. As an example, we see that for , it suffices to have that for the existence of an information epidemic in the conjoint network , whereas if the networks and are disjoint, an information epidemic can occur only if or .

Figure 3: The minimum required for the existence of a giant component in versus . The distributions and are given by (16) and (17), with and . The and values are multiplied by the corresponding and values to provide a fair comparison with the disjoint network case; under the current setting we have .
Figure 4: The fractional size of the giant component in versus . The distributions and are given by (16) and (17), with and . The and values are multiplied by the corresponding and values for fair comparison with the disjoint network case; under the current setting we have . The curves were obtained analytically via Theorem 3.1, whereas the marked points stand for the experimental results obtained with nodes by averaging experiments for each parameter set. We see that there is an excellent agreement between theory and experiments. (Inset) Average out-break size versus under the same setting.

Next, we turn to computation of the giant component size. We note that

and similar expressions can be derived for and . Now, for any given set of parameters, , we can numerically obtain the giant component size of by invoking the above relations into part of Theorem 3.1.

To this end, Figure 4 depicts the fractional size of the giant component in versus , for various values; as before, we set and yielding . In other words, the plots stand for the largest fraction of individuals in the social-physical network who receive an information item that has started spreading from a single individual. In Figure 4, the curves were obtained analytically via Theorem 3.1 whereas the marked points stand for the experimental results obtained with nodes by averaging experiments for each parameter set. We see that there is an excellent agreement between theory and experiment. Moreover, according to the experiments, the critical threshold for the existence of a giant component (i.e., an information epidemic) appears at when , when , and when . These values are in perfect agreement with the theoretically obtained critical threshold given by (5).

The inset of Figure 4 shows the average outbreak size versus under the same setting. To avoid the finite size effect (observed by Newman [20] as well) near the epidemic threshold, we have increased the network size up to to obtain a better fit. Again, we see that experimental results (obtained by averaging the quantity (10) over 200 independent experiments) agree well with the analytical results of Theorem 3.2.

5 Online Social Networks with nodes

Until now, we have assumed that apart from the physical network on nodes, information can spread over an online social network which has members. However, one may also wonder as to what would happen if the number of nodes in the online network is a sub-linear fraction of . For instance, consider an online social network whose vertices are selected by picking each node with probability where . This would yield a vertex set that satisfies

(18)

with high probability for any . We now show that, asymptotically, social networks with nodes have almost no effect in spreading information. We start by establishing an upper bound on the size of the giant component in .

Proposition 5.1

Let be a graph on vertices , and be a graph on the vertex set . With , we have

(19)

where and are sizes of the first and second largest components of , respectively.

\proof

It is clear that will take its largest value when is a fully connected graph; i.e., a graph with edges between every pair of vertices. In that case the largest component of can be obtained by taking a union of the largest components of that can be reached from the nodes in . With denoting set of nodes in the largest component (of ) that can be reached from node , we have

(20)

where stands for the th largest component of . The inequality (20) is easy to see once we write

where is the th element of . The above quantity is a summation of the sizes of mutually disjoint components of . As a result, this summation can be no larger than the sum of the first largest components of . The desired conclusion (19) is now immediate as we note that for all . \endproof

The next result is an easy consequence of Proposition 19 and classical results [33] for ER graphs.

Corollary 5.1

Let be an ER graph on the vertices and let be a graph whose vertex set satisfies (18) whp. The followings hold for :

  • If is in the subcritical regime (i.e., if ), then whp we have .

  • If , then we have