Modeling of Information Diffusion on Social Networks with Applications to WeChat

Modeling of Information Diffusion on Social Networks with Applications to WeChat

[    [    [    [    [ \orgnameCollege of Information System and Management, National University of Defense Technology, \postcode410073 \cityChangsha, \cnyChina \orgnameFaculty of Electrical Engineering, Mathematics, and Computer Science,Delft University of Technology, \streetMekelweg 4, \postcode2628 CD \cityDelft, \cnyThe Netherlands
Abstract

Traces of user activities recorded in online social networks such as the creation, viewing and forwarding/sharing of information over time open new possibilities to quantitatively and systematically understand the information diffusion process on social networks. From an online social network like WeChat, we could collect a large number of information cascade trees, each of which tells the spreading trajectory of a message/information such as which user creates the information and which users view or forward the information shared by which neighbours. In this work, we propose two heterogeneous non-linear models, one for the topologies of the information cascade trees and the other for the stochastic process of information diffusion on a social network. Both models are validated by the WeChat data in reproducing and explaining key features of cascade trees.

Specifically, we firstly apply the Random Recursive Tree (RRT) to model the cascade tree topologies, capturing key features, i.e. the average path length and degree variance of a cascade tree in relation to the number of nodes (size) of the tree. The RRT model with a single parameter describes the growth mechanism of a tree, where a node in the existing tree has a probability of being connected to a newly added node that depends on the degree of the existing node. The identified parameter quantifies the relative depth or broadness of the cascade trees, indicating that information propagates via a star-like broadcasting or viral-like hop by hop spreading. The RRT model explains the appearance of hubs, thus a possibly smaller average path length as the cascade size increases, as observed in WeChat. We further propose the stochastic Susceptible View Forward Removed (SVFR) model to depict the dynamic user behaviors including creating, viewing, forwarding and ignoring a message on a given social network. Beside the average path length and degree variance of the cascade trees in relation to their sizes, the SVFR model could further explain the power-law cascade size distribution in WeChat and unravel that a user with a large number of friends may actually have a smaller probability to read a message (s)he receives due to limited attention.

\kwd
\startlocaldefs\endlocaldefs{fmbox}\dochead

Research

addressref=aff1,aff2, email=liuliang@nudt.edu.cn ]\initsL\fnmLiang \snmLiu addressref=aff2, email=B.Qu@tudelft.nl ]\initsB\fnmBo \snmQu addressref=aff1, email=nudtcb9372@gmail.com ]\initsB\fnmBin \snmChen addressref=aff2, email=a.hanjalic@tudelft.nl ]\initsA\fnmAlan \snmHanjalic addressref=aff2, corref=aff2, email=H.Wang@tudelft.nl ]\initsH\fnmHuijuan \snmWang

{artnotes}{abstractbox}

Information diffusion \kwdstochastic model \kwdsocial networks \kwdWeChat \kwdrandom recursive tree.

1 Introduction

The rapid development of the Internet, smart phones and information technology has facilitated the boost of online social networks, such as Facebook, Twitter, Flickr, Digg and Sina Weibo. Such online social networks allow message, content and information in general to spread faster and wider than ever (e.g. retweeting) [1, 2, 3, 4]. Understanding the features and dynamics of information diffusion in social networks is crucial for businesses to promote products, but also for governments to predict and even regulate public opinion [5, 6, 7].

Most empirical work on aforementioned social networks has focused on basic statistical analysis of the features of the social networks, of the content popularity or of the content/information diffusion trajectories [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. For example, the information diffusion trajectory on a social network can generally be represented by a cascade (tree), where the root is the source node that creates the information and where the links represent the information transmitting paths between the users. The average path length of a cascade tree 111The average path length is the average number of links in the shortest path between two nodes., also called structural virality, in relation to the size of the tree, may indicate how deep (viral like information propagation) or shallow (broadcasting type of information diffusion) a cascade tree is [21, 12]. Time series analysis of e.g. content popularity over time has been explored to distinguish between different types of contents [22, 23]. Machine learning techniques have also been applied to, for example, predict content popularity based on the previous popularity and the features related to the contents and users that shared the content [24, 25].

Stochastic models, such as cellular automata [26], Threshold models [27, 28, 29], Susceptible Infected Recovered (SIR) [30, 31, 32, 17], and Linear Influence [33] have been studied to understand how the dynamics of information diffusion such as the spreading rate and the social network topology could influence a key feature of the diffusion process such as the popularity. However, we still insufficiently understand to whether such first order models with few parameters could quantitatively reproduce several key features of real-world information diffusion.

In this paper we aim to propose stochastic models with few parameters for (a) the topology of the information cascade trees and (b) the dynamics of information diffusion on a social network, that could capture several key features observed in the real-world cascade trees. Our models will be applied and validated by the WeChat dataset. WeChat is the most popular smart phone application in China and has about 800 million monthly active user accounts[34]. Apart from some sporadic efforts (e.g. [35]), information diffusion in WeChat has, not been studied extensively. Our choice of WeChat data to illustrate and validate our modelling methods is also because:

  • WeChat is a semi-closed social network where information is shared mainly via strong social ties (i.e. friends that mutually agree to share information)

  • WeChat records both the view and forward actions of each user and also from which node the information arriving to the user has been forwarded.

Topologies of cascade trees have so far been characterised by their average path length in relation to their sizes [21, 12]. The size of a cascade tree, i.e. the number of users that a message has reached in total in a social network, may range from hundreds to millions. Although a weak (strong) correlation between the average path length and the size of a tree may suggest shallow (deep) tree structures [36], we lack a systematic method to quantify the shallowness or deepness of a group of cascade trees. In this work, we propose to use of the generalised random recursive tree (RRT) [37, 38] with a single parameter to model cascade trees (possible of a given type of contents) with diverse sizes on an online platform. The RRT, a growth tree model, could well capture two features of WeChat cascade trees: the average path length and the degree variance, as a function of the cascade size. The identified parameter in the RRT model quantifies how deep or shallow the cascade trees are and implies the possible growing mechanisms of cascade trees. Via applying the RRT to the WeChat data, we find that hubs tend to appear as WeChat cascade trees become large in size, leading towards broadcast-like information diffusion.

Secondly, we propose the heterogeneous Susceptible View Forward Removed (SVFR) model, which allows users to have different probabilities of viewing a message, depending on their degree (the number of neighbours) in the underlying social network. Interestingly, our SVFR model could well explain the power-law distributed size of cascade trees, the degree variance and the average path length of a cascade tree in relation to the tree size as observed in the WeChat dataset. Importantly, our SVFR model benchmarked by the WeChat data points out that a WeChat user with a large number of friends tends to have a lower probability of viewing the messages shared by his/her friends, likely due to his/her limited energy and attention online.

The remainder of this paper is organized as follows: Section 2 describes the WeChat Moments diffusion data and how to construct the cascade trees. Section 3 and 4 present the RRT and SVFR model to capture the structure of the cascade trees and the dynamics of the information diffusion respectively. Section 5 summaries our findings and points out interesting future work.

2 Dataset Description

We firstly introduce the information diffusion dataset from the WeChat222https://en.wikipedia.org/wiki/WeChat platform[39, 40], which will be used to validate the two models that we are going to propose. WeChat Moments (WM)333https://en.wikipedia.org/wiki/Moments_(social_networking), known as Friends Circle, serves social networking functions in which users can view information shared by friends. In this work we focus on the diffusion of web pages in the WM network. A user may react to the web page forwarded/shared by his/her friend in three ways: (i) View the web page, meaning that the user clicks the link of the web page and views the content (ii) ignore the web page without a click, and (iii) Forward (or share) the URL of the web page to his/her friends.

An example of the diffusion of a web page in the WM network is shown in Figure 1. Firstly, a user being at the root of the tree creates a web page and makes it available to his friends. Then his friends may ignore, view or forward the web page after seeing the web page appearing in their Friends Circle with a title. The forwarding of the information (web page) allows its friends to further view, forward or ignore the information.

\Tree
Figure 1: \csentenceSchematic diagram of the diffusion of a web page in WeChat. Colors differentiate between the users showing different behaviors regarding what they do with the information offered to them. The green circles represent users who have viewed the message. The blue circles stand for users who have shared the message after viewing it. The gray circles are those users who have not viewed the content. A (view) cascade tree is composed of the source node that creates the message, the nodes that have viewed the message, thus both the blue and green nodes and the black solid arrows among them. The (view) cascade tree of each web page is recorded in the data.

We obtained the web page spreading dataset in WeChat Moments from a third-party service company444http://www.fibodata.com/. The service company helps users create HTML5 format web pages to share the information including advertisements, web games, news, articles or holiday greetings. The dataset recorded from January to February in all user activities, such as view and forward, and their corresponding time stamps related to all the web pages created with the format support from the service company. Both the content of the web pages and users are anonymised by web page indexes and user indexes, respectively. A user must first view a web page before (s)he forwards it. Whenever a user views a web page shared by a friend, the index of both the user who views the web page and the friend who shares the web page are recorded in the dataset, allowing us to construct the cascade tree for each web page. We select the web pages whose diffusion starts and ends within the period of 45 days. We assume that the diffusion of a web page stops if after the generation of the web page there is a day that no viewing/forwarding/sharing happens. Although resurgence of the spread of a content may occur after a silent day [36], our assumption is supported by the data: given that a web page is neither forwarded nor viewed by any user in a given day, the probability that forwarding or viewing of this page occurs after that day within the days’ observation window is . We obtain web pages, whose life span is approximately within the considered time window. More than million users are involved in the diffusion of these web pages. For each web page, we construct its cascade tree, in which nodes represent the users who have viewed the web page and some of these nodes may have forwarded the web page. Such a cascade tree is also called a view cascade tree. Each information cascade is a tree without cycles because a user seldom views/forwards the same content more than once. If, in the rare case a user views (shares) a web page more than once, we consider only the first time when the user views (shares) the page.

These cascade trees collected from WeChat will be used to valid our models, i.e. whether our models could reproduce several key features of the observed cascade tree features.

3 Modeling of Information Cascade Tree Structure

In this section, we focus on the modelling of the topologies of the information cascade trees, without considering the underlying dynamics of users. We aim to propose a tree model that could construct trees that share similar properties of the cascade trees observed in WeChat. Firstly, we will analyse two fundamental properties of the information cascade trees in WeChat, that we would like our model to reproduce, namely the average path length and degree variance. Afterwards, we propose to use the Random Recursive Tree (RRT) model to model information cascade trees and illustrate to what extent this model could capture the two key features of the information cascades in WeChat.

3.1 Cascade Structure in WeChat

Two basic properties of a generic tree are the average path length and the degree variance. The average path length, also known as ”Wiener Index” or ”Structural Virality”. It is defined as the average of the number of links in the shortest path between any two nodes and . Hence, in a tree with nodes we can formulate it as

(1)

The degree variance is the variance of degrees of all the nodes in a tree,

(2)

where the degree of the node tells how many links the node has and is the average degree of all the nodes. The degree variance can be equivalently characterized by the standard deviation of the degree, which is used later in our data analysis and model validation.

It has been shown that the average path length is an important characteristic of information cascades and complex networks in general [41, 12, 42]. Consider the class of cascade trees collected from an online social network. If the average path length of a cascade tree does not increase much with the size (number of nodes) of the tree, hubs may exist in relatively large cascade trees. In this case, information propagates via star-like broadcasting and large cascade trees are relatively shallow. However, if the cascade trees’ average path lengths increase dramatically with their sizes, large cascade trees tend to be deep without large hubs and information spreads viral-like, hop by hop.

Both properties are sensitive to the size of the tree. For example, large trees may tend to have a large average path length. As shown in Figure 5, the sizes of the cascade trees collected from WeChat follow approximately a power-law distribution. Hence, we group the cascades trees according to their sizes that are slitted uniformly in logarithmic scale. We consider cascading trees that have more than nodes in the dataset, which corresponds to the web pages that could propagate to a certain extent. Both properties are explored for each group of trees. Figure 2 (a) and (b) show the average path length and degree variance of a cascade tree (group) as a function of the size of the tree (group), respectively. The average path length increases as the size of the cascade tree increases, except that when the size is large. The decrease in the average path length around size is due to the hubs in the cascade trees, i.e. higher degree nodes, which is reflected in the large degree variance of large cascade trees.

We aim to propose a tree model that could capture both properties as a function of tree size and could further quantitatively characterize how deep/shallow the cascade trees are.

3.2 The Random Recursive Tree Model

We propose to use the Random Recursive Trees RRTs to model the cascade trees. A RRT [37, 38, 43] is a growth tree model that starts with the root with node index at and adds a node at each time step to an existing node selected as follows: each existing node with its degree at time t has the probability of being connected to the newly added node. Hence, the probability that an existing node is connected to a newly added node is proportional to the degree of this node of power . We denote a RRT with nodes and the scaling parameter by . Specifically, corresponds to a uniform recursive tree (URT) where at each time step, a randomly selected existing node is connected to the newly added node [44, 45]. is a scale-free tree where at each time step, the probability for an existing node to be connected to the new node is proportional to the degree of this node [46]. When (), the probability that an existing node is attached to a new node is sub-linear (super-linear) of the degree of the existing node. When , the RRT approaches a star topology, whose average path length is for a star with nodes.

We conduct independent realisations of each RRT class with size and scaling parameter and obtain for each class the average as well as the standard deviation of the two key topological features, i.e. the average path length and the degree variance. As illustrated in Figure 2, a small (large) suggests a relative deep (shallow like a star) tree with a large (small) average path length, that corresponds to the viral (broadcast) type of information diffusion.

Figure 2 shows that the average path length and degree variance, equivalently reflected by the degree standard deviation, in WeChat cascade trees as a function of the tree size can be well captured by RRT model with the scaling parameter around if we look at the mean of these two properties. When the variance of these properties, i.e. error bar, is taken into account, the WeChat cascade trees can be well described by the RRT model with . This observation suggests that the WeChat cascade trees may follow a growth rule where a high degree node in the tree has a high probability to attract the connection to new nodes, such that large trees tend to have hubs, a large degree variance and a moderate average path length. When is positive, the average path length of RRTs increases first and decreases afterwards as the size of the tree increases. This can be observed evidently in the RRTs when in Figure 2. The average path length starts to decrease at a small tree size if is large. Such a change of the average path length as a function of the tree size is due to the fact that as a RRT grows with a positive , hubs tend to appear and have a significantly higher chance to be connected to newly added nodes, and thus reduce the average path length and increase the degree variance. The average path length in WeChat cascade trees indeed increases first and then decreases as the cascade tree size increases, which can be thus well captured by the RRT model.

The RRT model could be used to model the cascade trees, not limited to WeChat, that have diverse sizes. The parameter that best fits the data reflects quantitatively how deep the tree is and how diverse the degrees of the nodes in the tree are. In this way, we could compare different online systems with respect to in which system information propagates more via hubs/broadcasting or viral-like spreading.

Figure 2: \csentenceThe average path length and degree standard deviation of the cascade trees in WeChat and the RRT models as a function of tree size. The cascade trees in WeChat are grouped according to their sizes: [100,200), [200,400), [400, 800), [800,1600) etc. The average and standard deviation (error bar) of these two properties are obtained for each group and plotted as a function of the medium size of each group. For a given size of the trees and a given , 1000 RRTs are generated independently and the average and standard deviation (error bar) of the average path length and degree standard deviation are obtained from the 1000 realization. The error bar for the two properties are shown for the RRT model with and .

4 Modeling of Information Cascade Process

In this section, we aim to develop a stochastic model of the information diffusion process. We develop this model based on our understanding of the WeChat information diffusion mechanisms and validate the model according to three key features observed in the WeChat dataset: the distribution of the sizes of the cascade trees, the average path length and the degree variance of a cascade tree in relation to the size of the tree.

4.1 The Susceptible View Forward Removed Model

We propose the Susceptible View Forward Removed (SVFR) model to describe the information diffusion process on a social network. This model is based on classic viral spreading models such as SIR model but more general and practical with respect to the definition of the possible states of a user and the possible non-liner and non-homogeneous probability for a user to view a message shared by its friend.

In the SVFR model, each node can be in one of the following four states at any time step:

  • Susceptible (S) - the user has the potential to read a message/content, but has not yet read it,

  • View (V) - the user views the message,

  • Forward (F) - the user forwards the message,

  • Removed (R) - the user ignores the message either because (s)he does not want to read the message or has already viewed or forwarded the message.

For a given message, all the nodes are initially susceptible, except for the node that firstly publishes/shares this message thus is in state F at step . The state transition diagram has been shown in Figure 3. For any node that is in state F at any time step , each of its susceptible neighbours in the social network has a probability to view the message at step . Moreover, each neighbor that views the message has a probability to forward the message immediately after reading, and thus transits to state F at step . In other words, each neighbor of a node in state F at time t, has a probability of being in state V (view but not forward) and a probability of being in state F (read and forward) and probability of being in state R (ignore the message without reading the content) at time step . For any node in state V or F at any given time, this node will be in state R at the next time step. The diffusion process of a message stops when all the nodes are either in state S or R, thus when the system reaches the stable state.

Susceptible

View

Forward

Removed

Figure 3: \csentenceStates transition diagram of the SVFR model.

Furthermore, we generalize the SVFR model to be a heterogeneous stochastic model where the probability that a user reads a message shared by its friend may depend on the degree of this user in the underlying social network. This is motivated by the fact that a node has a large number of friends tends to have a low probability to read a message shared by his/her friend due to the large number of messages he/she is exposed to and his/her limited effort in reading messages [47, 48]. Without loosing generality, we assume that the probability for a node to read a message shared by a neighbor may depend on the degree of this node, where the power exponent is assumed to be positive and the constant is determined by the given average probability to view a message over all the nodes 555Each node may view a message maximally once.:

(3)

As observed in the data and assumed in our model, users seldom reads or share a message more than once. The average view probability suggests how infectious/interesting a message is for users to view it. When , all nodes have the same view probability. Similar homogeneity has been usually assumed in previously proposed information diffusion models [12]. Our heterogeneous model takes into account the possibility that the view probability of each node may be inversely proportional to the degree of the node, characterised by the degree scaling parameter . In the proposed stochastic model, we did not take into account a realistic and possibly heterogeneous time delay, e.g., between the time when a node shares a message and the time a neighbor reads or shares the message.

We assume that the probability that a user forwards a message after viewing it, the so-called forward probability, is a constant, which is a simple start for the model study. Given the underlying social network and given the parameters , and to be calibrated, the SVFR model could iterate the stochastic propagation of a message, each resulting in a cascade tree composed of users that have created, viewed and forwarded the message.

4.2 Model Validation

Figure 4: \csentenceDistribution of the average forward probability in a cascade tree. This distribution is obtained from the WeChat cascade trees that have a size larger than or equal to .

The (average) forward probability in a cascade tree can be obtained as the number of nodes that forward the message over the total number of nodes in the cascade. Figure 4 shows that the forward probabilities of the WeChat cascade trees follows approximately a Gaussian distribution where forward probabilities are close to the average. Hence, we consider the average forward probability observed in the data as the forward probability in our SVFR model.

The WeChat social network topology is unknown. Hence, we cannot derive directly from the data the two parameters related to the degree dependent view probabilities: the average view probability and scaling parameter . Instead, we will explore whether the SVFR model with tunable parameters and could reproduce the three key features of the WeChat cascade trees: the size distribution, the average path length and degree variance in relation to the tree size. The distribution of the sizes of the cascade trees is a crucial feature for a online social network, characterizing the distribution of the prevalence or popularity of the information propagated on the network. We assume that the underlying social network is a scale-free network with a power law degree distribution , as observed in many real-world networks [49]. We use the configuration model [50, 51, 52] to construct the random scale-free networks with a power exponent of the degree distribution , a minimum degree as in [12] and a cutoff of the maximum degree [53], where is the network size. When the network size is , the average degree .

For each given pair of and , we generate independently scale-free networks using the configuration model with nodes and power exponent . On each generated network, we carry out the information spread of 100 messages independently according to the SVFR model where the initial node that creates/shares the message is chosen uniformly at random. In total, we obtain cascade trees for the given and .

Firstly, we explore the distribution of the sizes of the cascade trees in both the WeChat dataset and in our SVFR model. As shown in Figure 5, the distribution of the sizes of the observed WeChat cascade trees is approximately a power-law distribution. Since we are interested in the cascade trees with a size larger than , that corresponds to the messages that could propagate to a certain extend, we fit the tail part of the distribution when the size is larger than or equal to . The power exponent is approximately . The power-law cascade size distribution has also been observed in other social networks, such as Twitter [11, 10, 54], Flickr [14], Digg [15] and Sina Weibo[16].

Figure 5: \csentenceDistribution of the size of the WeChat cascading trees with the curve fitting for the tail where the size is larger than or equal to 100.

We take as an example the SVFR model with the average view probability whereas the degree scaling parameter varies. Figure 6 illustrates how the size distribution of the cascade trees generated by our SVFR model changes as the degree scaling parameter increases.

Figure 6: \csentenceCascade size distribution of the SVFR model for different degree scaling parameter . The underlying scale-free network size is and the average view probability is . The power-law part of the tail has been fitted. Each figure is obtained by independent realisations of the SVFR process on each of the independently generated underlying scale-free networks.

It turns out that when , i.e. when all the nodes have the same probability to view a message, the cascade size distribution has a peak in the tail. In this case, the cascade size of our model does not follow a power-law distribution as WeChat cascades but has a significantly higher probability to be large. Similar observation holds when the degree scaling parameter is small. When the view probability or the network size increases, the separation between the power law decrease and the peak in the size distribution becomes even more apparent. As increases, the cascade size distribution becomes a power-law distribution, the same as observed in WeChat. The hubs play a key role in such a change in the size distribution. Firstly, a hub (a high degree node in the underlying scale-free network) has a higher probability that one of its neighbors forwards the message than low degree nodes. Secondly, a hub has a higher probability to view thus forward a message when is smaller and given the same average view probability . Thirdly, the forwarding of a message by a hub allow its large number of neighbours to further view and forward the message, leading potentially to a large cascade. Hence, hubs facilitate the appearance of large cascades, especially when is small. This explains as well why the largest possible cascade size decreases as increases. Figure 7 further supports our explanation. We look into the maximal degree (in the underlying social network) of that nodes that have forwarded the information in a cascade tree in relation to the size of the cascade. As the increases, i.e. a higher degree node involves in the forwarding of the message, an abrupt jump occurs in the cascade size, when . Hence, the bulk in the size distribution corresponds to the large cascades where hubs involve in forwarding the information. When , the increase of the cascade size with is relatively continuous.

Figure 7: \csentenceThe size of a cascade tree generated by the SVFR model versus the maximum degree in the underlying social network of the nodes that have forwarded the message in the cascade tree when (a) and (b) . Cascade trees larger than in size are considered.

Figure 6 suggests that should not be small in order to capture the power-law size distribution in the WeChat dataset.

Furthermore, we explore how the power exponent/slope of the power-law cascade size distribution generated by the SVFR model is influenced by the size of the underlying network, the average view probability and the degree scaling parameter . As shown in Figure 8, the exponent is obtained via the power-law curve fitting of the power-law decreasing part of the size distribution. Although different curving fitting methods may influence the obtained power exponents [51], we adopt this simplest method to illustrate our methodologies to identity the parameters of the proposed SVFR model.

As shown in Figure 8, power exponent is insensitive to the size of the underlying networks, though the average cascade size may depend on the size of the underlying network. We will focus on the underlying network size , which is large as well feasible for simulations. A smaller and a large average view probability contribute to a smaller power exponent , thus large cascade trees with a higher probability. The power exponent observed in WeChat can be approximated by our SVFR model when and or and or and .

Figure 8: \csentenceThe power exponent of the power-law cascade size distribution generated by the SVFR model as a function of the size of the underlying network, the average view probability and the degree scaling parameter . For each set of parameters, the cascade size distribution is obtained from the 100 iterations of the SVFR information spread on each of the 100 independently generated underlying social networks.

Finally, we investigate the average path length and the degree variance of the cascade trees in relation to the cascade tree sizes produced by our SVFR model with the aforementioned three sets of parameters that could already well capture the cascade size distribution of WeChat.

Figure 9: \csentenceThe average path length and degree standard deviation of the cascade trees in WeChat, of the RRT structural model and of the cascade trees generated by the SVFR model. We consider the SVFR model with the three sets of parameters and that could well capture the WeChat cascade size distribution. The underlying networks of the SVFR model are scale-free with size . Given the parameter and , we perform realisations of the SVFR model on each of the independently generated underlying networks leading to cascade trees. These cascade trees generated by SVFR are grouped according to their sizes: [100,200), [200,400), [400,800) and [800,1600]. The average and standard deviation of the two key properties are deived for each group and plotted as a function of the medium size of the group. When and , the cascade trees generated by SVFR model are all smaller than 800 in size. Given the parameter and tree size, we carry out iterations of generating the cascade trees using RRT model and obtain the average and standard deviation (error bar) of these two properties.

Figure 9 shows that the cascade trees generated by the SVFR model with and well approximate the cascade trees in WeChat with respect to their average path length and the degree variance/standard deviation. The cascade trees generated by the SVFR, the same as the WeChat cascade trees, are also well bounded by the RRT models with and and closer to RRT models with , verifying the consistency of the RRT and SVFR models.

As mentioned before, it would be interesting to explore even larger underlying network sizes, which could lead to larger cascade trees thus improve the SVFR model validation with respect to capturing features of cascade trees with a broader range of sizes.

Our SVFR model could well explain the cascade size distribution including the power-law decay exponent, the average path and the degree variance of the cascade trees in WeChat and suggests that a user with a large number of friends may have a lower probability to view the message shared by a friend.

5 Conclusion

The cascade trees that describe the information spread trajectories in social networks have been widely studied. In this work, we rely on the data extracted from the WeChat social network as a test bed to further advance the information diffusion analysis methods from two aspects.

Firstly, we propose to model the cascade tree topology by random recursive trees RRTs. The RRT model could well reproduce or explain two fundamental properties of the cascade trees in the WeChat network, i.e. the average path length and the degree variance in relation to the tree size. The identified single parameter in the RRT model, allows us, for the first time to quantify how deep (viral like spread) or shallow (broadcast type spread) a class of cascade trees are. Hence, we could compare or classify different online networks regarding to that the information spread on each network is more broadcast or viral like. The RRT model also unravels some interesting phenomena in the cascade-tree growth, like the emergence of hubs.

Secondly, we introduced the SVFR stochastic model to capture the information diffusion process on a network. The model encodes three types of user reactions to a message they receive: ignore, view or forward the message, and was shown to capture and explain three main properties of the WeChat cascade trees: the average path length, the degree variance and the tree size distribution. Our model calibration suggests that a WeChat user with a large number of friends tends to have a low probability to view a message shared by his/her friends. This finding can be supported by the cognitive and biological constraints of users as predicated by Dunbar’s theory [47, 48].

The WeChat dataset served as excellent test bed enabling the above mentioned contributions due to the rich user actions it captures and related to the way how users react to the message forwarded to them. We believe, however, that our contributions can serve as a starting point to systematically explore the structure and dynamics of information diffusion in general social networks, not limited to WeChat. The proposed SVFR stochastic model can be applied to other online social networks as well to explore e.g. whether other types heterogeneity may exist. For example, the view or forward probability of a content may depend on the content. Another promising future research direction is to explore the time delay in the information diffusion model in order to explain e.g. how fast a message could reach a certain number of users.

{backmatter}

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

Conceived and designed the experiment: LL, BQ, BC, HW. Performed the experiment: LL. Analyzed the data: LL, BQ, AH and HW. Wrote the paper: LL and HW. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank Linnan He (The School of Communication and Design, Sun yat-sen University) and Yichong Bai (Fibonacci Consulting Co. Ltd.) for providing the WeChat dataset. We also wish to thank the National Natural Science Foundation of China under Grant No. 71673292, 61503402, 61403402, 61374185, 71373282.

References

  • [1] Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S.: Social media? get serious! understanding the functional building blocks of social media. Business horizons 54(3), 241–251 (2011)
  • [2] Guille, A., Hacid, H., Favre, C., Zighed, D.A.: Information diffusion in online social networks: A survey. ACM SIGMOD Record 42(2), 17–28 (2013)
  • [3] Obar, J.A., Wildman, S.S.: Social media definition and the governance challenge-an introduction to the special issue. Available at SSRN 2663153 (2015)
  • [4] Zhang, Z.-K., Liu, C., Zhan, X.-X., Lu, X., Zhang, C.-X., Zhang, Y.-C.: Dynamics of information diffusion and its applications on complex networks. Physics Reports 651, 1–34 (2016)
  • [5] Hughes, A.L., Palen, L.: Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management 6(3-4), 248–260 (2009)
  • [6] Kaplan, A.M., Haenlein, M.: Users of the world, unite! the challenges and opportunities of social media. Business horizons 53(1), 59–68 (2010)
  • [7] Khondker, H.H.: Role of the new media in the arab spring. Globalizations 8(5), 675–679 (2011)
  • [8] Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010). ACM
  • [9] Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 65–74 (2011). ACM
  • [10] Baños, R.A., Borge-Holthoefer, J., Moreno, Y.: The role of hidden influentials in the diffusion of online information cascades. EPJ Data Science 2(1), 1 (2013)
  • [11] Taxidou, I., Fischer, P.M.: Online analysis of information diffusion in twitter. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 1313–1318 (2014). ACM
  • [12] Goel, S., Anderson, A., Hofman, J., Watts, D.J.: The structural virality of online diffusion. Manage Sci 62(1), 180–196 (2015)
  • [13] Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Proceedings of the 21st International Conference on World Wide Web, pp. 519–528 (2012). ACM
  • [14] Cha, M., Mislove, A., Gummadi, K.P.: A measurement-driven analysis of information propagation in the flickr social network. In: Proceedings of the 18th International Conference on World Wide Web, pp. 721–730 (2009). ACM
  • [15] Ghosh, R., Lerman, K.: A framework for quantitative analysis of cascades on networks. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 665–674 (2011). ACM
  • [16] Bao, P., Shen, H.-W., Chen, W., Cheng, X.-Q.: Cumulative effect in information diffusion: empirical study on a microblogging network. PloS one 8(10), 76027 (2013)
  • [17] Feng, L., Hu, Y., Li, B., Stanley, H.E., Havlin, S., Braunstein, L.A.: Competing for attention in social media under information overload conditions. PloS one 10(7), 0126090 (2015)
  • [18] Li, Y., Qian, M., Jin, D., Hui, P., Vasilakos, A.V.: Revealing the efficiency of information diffusion in online social networks of microblog. Information Sciences 293, 383–389 (2015)
  • [19] Wang, R., Rho, S., Chen, B.-W., Cai, W.: Modeling of large-scale social network services based on mechanisms of information diffusion: Sina weibo as a case study. Future Generation Computer Systems (2016)
  • [20] Zhang, B., Qian, Z., Lu, S.: Structure pattern analysis and cascade prediction in social networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 524–539 (2016). Springer
  • [21] Bounova, G., de Weck, O.: Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. Physical Review E 85(1), 016117 (2012)
  • [22] Crane, R., Sornette, D.: Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences 105(41), 15649–15653 (2008)
  • [23] Wu, F., Huberman, B.A.: Novelty and collective attention. Proceedings of the National Academy of Sciences 104(45), 17599–17601 (2007)
  • [24] Richier, C., Altman, E., Elazouzi, R., Jimenez, T., Linares, G., Portilla, Y.: Bio-inspired models for characterizing youtube viewcout. In: Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference On, pp. 297–305 (2014). IEEE
  • [25] Cheng, J., Adamic, L.A., Kleinberg, J.M., Leskovec, J.: Do cascades recur? In: International Conference on World Wide Web (2016)
  • [26] Goldenberg, J., Libai, B., Muller, E.: Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12(3), 211–223 (2001)
  • [27] Granovetter, M.: Threshold models of collective behavior. American journal of sociology, 1420–1443 (1978)
  • [28] Li, Q., Braunstein, L.A., Wang, H., Shao, J., Stanley, H.E., Havlin, S.: Non-consensus opinion models on complex networks. Journal of Statistical Physics 151(1), 92–112 (2013). doi:10.1007/s10955-012-0625-4
  • [29] Qu, B., Li, Q., Havlin, S., Stanley, H.E., Wang, H.: Nonconsensus opinion model on directed networks. Phys. Rev. E 90, 052811 (2014). doi:10.1103/PhysRevE.90.052811
  • [30] Hethcote, H.W.: The mathematics of infectious diseases. SIAM review 42(4), 599–653 (2000)
  • [31] Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Physical review letters 86(14), 3200 (2001)
  • [32] Newman, M.E.: Spread of epidemic disease on networks. Physical review E 66(1), 016128 (2002)
  • [33] Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: 2010 IEEE International Conference on Data Mining, pp. 599–608 (2010). IEEE
  • [34] Tencent: Tencent Announces 2016 Second Quarter and Interim Results. http://www.tencent.com/en-us/ir/news/2016.shtml
  • [35] Li, Z., Chen, L., Bai, Y., Bian, K., Zhou, P.: On diffusion-restricted social network: A measurement study of wechat moments. arXiv preprint arXiv:1602.00193 (2016)
  • [36] Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J., Tiwari, M.: Global diffusion via cascading invitations: Structure, growth, and homophily. In: Proceedings of the 24th International Conference on World Wide Web, pp. 66–76 (2015). ACM
  • [37] Rudas, A., Tóth, B., Valkó, B.: Random trees and general branching processes. arXiv preprint math/0503728 (2005)
  • [38] Krapivsky, P.L., Redner, S.: Organization of growing random networks. Physical Review E 63(6), 066123 (2001)
  • [39] Schiavenza, M.: Wechat—not weibo—is the chinese social network to watch. The Atlantic 30 (2013)
  • [40] Lien, C.H., Cao, Y.: Examining wechat users’ motivations, trust, attitudes, and positive word-of-mouth: Evidence from china. Computers in Human Behavior 41, 104–111 (2014)
  • [41] Wiener, H.: Structural determination of paraffin boiling points. Journal of the American Chemical Society 69(1), 17–20 (1947)
  • [42] Li, C., Wang, H., de Haan, W., Stam, C.J., Mieghem, P.V.: The correlation of metrics in complex networks with applications in functional brain networks. Journal of Statistical Mechanics: Theory and Experiment 2011(11), 11018 (2011)
  • [43] Kunegis, J., Blattner, M., Moser, C.: Preferential attachment in online networks: Measurement and explanations. In: Proceedings of the 5th Annual ACM Web Science Conference, pp. 205–214 (2013). ACM
  • [44] Su, C., Feng, Q., Hu, Z.: Uniform recursive trees: Branching structure and simple random downward walk. Journal of mathematical analysis and applications 315(1), 225–243 (2006)
  • [45] Van Mieghem, P.: Performance Analysis of Complex Networks and Systems. Cambridge University Press, Cambridge, United Kingdom (2014)
  • [46] Szabó, G., Alava, M., Kertész, J.: Shortest paths and load scaling in scale-free trees. Physical Review E 66(2), 026101 (2002)
  • [47] Dunbar, R.I.M.: Neocortex size as a constraint on group size in primates. Journal of Human Evolution 22(6), 469–493 (1992). doi:10.1016/0047-2484(92)90081-J
  • [48] Gonçalves, B., Perra, N., Vespignani, A.: Modeling users’ activity on twitter networks: Validation of dunbar’s number. PLOS ONE 6(8), 1–5 (2011). doi:10.1371/journal.pone.0022656
  • [49] Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. science 286(5439), 509–512 (1999)
  • [50] Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemporary physics 46(5), 323–351 (2005)
  • [51] Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM review 51(4), 661–703 (2009)
  • [52] Hernandez, J.M., Kleiberg, T., Wang, H., Mieghem, P.V.: A qualitative comparison of power law generators. In: International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2007) (2007)
  • [53] Cohen, R., Erez, K., Ben-Avraham, D., Havlin, S.: Resilience of the internet to random breakdowns. Physical review letters 85(21), 4626 (2000)
  • [54] Goel, S., Watts, D.J., Goldstein, D.G.: The structure of online diffusion networks. In: Proceedings of the 13th ACM Conference on Electronic Commerce, pp. 623–638 (2012). ACM
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
226183
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description