Analytical study of quality-biased competition dynamics for memes in social media
The spreading of news, memes and other pieces of information occurring via online social platforms has a strong and growing impact on our modern societies, with enormous consequences, that may be beneficial but also catastrophic. In this work we consider a recently introduced model for information diffusion in social media taking explicitly into account the competition of a large number of items of diverse quality. We map the meme dynamics onto a one-dimensional diffusion process that we solve analytically, deriving the lifetime and popularity distributions of individual memes. We also present a mean-field type of approach that reproduces the average stationary properties of the dynamics. In this way we understand and control the role of the different ingredients of the model, opening the path for the inclusion of additional, more realistic, features.
Understanding how information spreads in social media is a topic of uttermost interest, as it is fundamental for devising strategies aimed at fostering the diffusion of beneficial information or contrasting the dangerous spread of fake news Howell et al. (2013); Del Vicario et al. (2016); Vosoughi et al. (2018). Activity in this area has boomed in recent years Kwak et al. (2010); Lerman and Ghosh (2010); González-Bailón et al. (2011); Bakshy et al. (2011); Baños et al. (2013); Cheng et al. (2014); Nishi et al. (2016); Pramanik et al. (2017); Wegrzycki et al. (2017). From the point of view of statistical physics, information spreading is a prominent example of a collective macroscopic phenomenon emerging in a self-organized manner from the spontaneous activity of a large number of individual elements Castellano et al. (2009); Buchanan (2007). The investigation of information spreading is particularly challenging both from an empirical point of view and from a theoretical one. The existence of many different social media platforms, each characterized by different features often changing over time, provides a wealth of data but leaves the issues of universality and reproducibility wide open. From the modeling point of view, the identification of a limited number of relevant mechanisms and crucial observable quantities is highly nontrivial.
The topology of the interaction pattern among users in online social media, which is usually very heterogeneous, is one of the ingredients usually taken into account. Another fundamental factor affecting the way news, memes or rumors are diffused is information overload. When online, individuals are hit by a steady and overwhelming flow of messages; the finite attention and limited memory strongly influence what information is propagated further and how. This results in a competition among a large number of items diffusing simultaneously, which is a key ingredient of many models for information spreading Wu and Huberman (2007); Weng et al. (2012); Gleeson et al. (2014, 2016). A third ingredient that plays a role in determining the fate of messages in online media is the variability of the “quality” of the item: some pieces of information may be intrinsically more appealing and thus more likely to be shared by online users. A very recent work by Qiu et al. Qiu et al. (2017) considered together these three elements to study the interplay of an heterogeneous quality distribution and information overload in online social media (with particular reference to Twitter), with the goal of investigating whether a good tradeoff between discriminative power and quality diversity is possible.
Although highly stilized, the model for meme dynamics introduced in Ref. Qiu et al. (2017) contains several relevant ingredients of the real phenomenon and in particular the original element that the competition among different memes favors those having a higher intrinsic quality. For this reason we call it the quality-biased competition (QBC) model. In this paper we study the QBC dynamics in detail, by considering some carefully devised simplifications, which make possible an analytical treatment providing explicit formulas for the behavior of the main observables. In this way we achieve a full understanding of the model phenomenology and of its dependence on the value of the different parameters.
Ii The QBC model
We consider the model for meme spreading introduced in Ref. Qiu et al. (2017). agents (or users), each of them equipped with a memory containing at most memes, are the nodes of a static network. Memories are ordered lists from to 1. At each time step an individual is selected uniformly at random and transmits a meme to all her neighbors. With probability , the transmitted meme is an existing one, taken from the agent memory; otherwise, with probability , a new meme is created. In both cases, the transmitted meme is put at the top (position ) of the memory of the agents involved (both the transmitter and the receivers) shifting all other memes downward. Each meme is attributed randomly, upon its creation, a fitness value between 0 and 1, a proxy of its quality. When a user selects an old meme for transmission, the probability to select meme is proportional to . In this way high fitness increases the chance of the meme to be spread. Apart from this bias, the dynamics can be seen as the competition among many susceptible-infected spreading processes in a metapopulation framework Pastor-Satorras et al. (2015).
From the initial configuration with all empty memories, memes are introduced and copied until some of the memories fill up. When all slots in a memory are occupied and a new meme must enter, the item in the last position is removed and forgotten by the agent. Memories thus work according to a ”first-in first-out” rule, mimicking what happens on users feeds of some social networks, such as Twitter. After an initial transient, a steady state is reached where all memory slots in the system are occupied. Memes are continuously created, diffuse over the network and get eventually extinct. Quantities characterizing the dynamics of a meme are its lifetime, i.e., the time passed between the creation of a meme and its extinction, and its popularity, defined as the total number of times the meme is transmitted, throughout its lifetime, from an agent to one of her neighbors.
Iii Robustness with respect to the topology
We first check how much the model phenomenology depends on details of the interaction pattern, by performing numerical simulations on several types of network (see Supplementary Material, SM). It turns out that the distributions of the main observables are qualitatively robust with respect to changes of the underlying network (see Fig. 1). Both distributions have broad power-law tails, cutoff exponentially over a scale growing when , the rate of creation of new items, goes to zero. The lifetime distribution also exhibits a peak for of the order of , corresponding to the average time needed for a meme that is not shared to disappear from the memory of the agent that created it. The average values of the popularity and of the lifetime strongly grow with the fitness when is small. The effect of the parameter is very weak (see SM).
The overall picture remains the same even if the contact pattern is an annealed random regular graph where each node has a single connection. This suggests that a mean-field approach, which effectively considers a regular annealed network as contact pattern, may provide an accurate description of the model dynamics.
Iv A microscopic approach
We focus now on the behavior of an individual meme of fitness . We define as the position of meme in the memory of agent at time : corresponds to the top position (a newly created or transmitted meme), while means that the meme is about to be forgotten. If meme does not appear in the memory of agent , then . We neglect the case in which an agent has more copies of the same meme in his feed. The quantity cumulates the positions of the meme in all users’ feeds, thus providing information about its overall diffusion. For simplicity we assume that each user is in contact with a single randomly chosen other user and that, when with probability a user produces a new meme, she simply puts it on top of her memory, without immediately sharing it. For the same reason we assume that, when an existing meme is selected for transmission, it is left in the original position in the transmitter feed, without putting it at the top ot the memory. We checked that both these assumption have negligible effects. The quantity performs over time a one dimensional random-walk in the interval . is an absorbing boundary condition (after extinction a meme will never reappear) and is a semireflecting boundary (because of our approximation, if the meme is in the first position of all feeds, cannot grow further). The initial condition is . At each time step the elementary events are:
Apart from different expressions close to the boundaries (see SM for details), the probabilities are:
where is the fitness of the considered meme and is the number of individuals possessing in their memory.
Eq. (2) is derived based on the consideration that is increased by if a transmission event takes place (it happens with probability ), if meme is present in the feed of the transmitting user (probability ) and not present in the feed of the receiver and if meme is selected for transmission among all memes in the feed. This last event occurs with probability , which we approximate with . is the number of individuals possessing in their memory, that we approximate as
where represents the integer part (floor) of .
With regard to , the value of decreases because the insertion of a new meme in a user feed causes the downward shift of all other memes. The insertion occurs at each time step, irrespective of whether the inserted meme is new or transmitted. Hence , the likelihood that meme is present in the involved memory. From the expressions of the probabilities it is immediately clear that nothing depends on and separately, but only through the combination .
We simulate numerically this random walk description of meme dynamics. In the SM we show that the popularity and lifetime distributions obtained match very closely those found for the original QBC model.
In order to make the analytical treatment easier, we further simplify the random-walk description. In particular, we remove the floor function from Eq. (4), we set equal to the term in Eq. (2) and we introduce a numerical constant in the denominator of Eq. (4). See the SM for the justification of these modifications. Again we numerically check the distributions generated by this simplified random-walk description and find (see SM) that they are essentially equal to those of the original QBC dynamics.
At this point we can write down the master equation for the modified random walk, which reads
where Equation (5a) holds for provided one considers for and .
By setting with ranging between and and taking the thermodynamic limit , from the master equation we obtain (see SM) the Fokker-Planck (FP) equation for the probability that the walker is in position at time :
For large we have .
iv.1 Purely diffusive dynamics
In the limit the drift term in Eq. (6) vanishes. We are left with the FP equation of a purely diffusive stochastic process:
which differs from standard diffusion because of the space-dependent diffusion coefficient. The limit changes also the boundary conditions: the boundary in is semireflecting because can decrease with probability or remain unchanged, with probability . Thus in the case both the boundary condition in and in are absorbing: . The initial condition is with .
It is possible to find the solution of this equation as an eigenfunction expansion of the operator (see SM for details), obtaining:
where is a Bessel function of the first kind. The characteristic time scale of each eigenfunction is
where the , the zeros of , are approximated as . Using this expression, it is possible to compute (see SM for details) the survival probability in the limit , which turns out to be
where is after the approximation is made. Based on this result the lifetime distribution can be computerd (see SM). In the limit of large , i.e., diverging , it reads
This expression of accounts for the most important feature observed in simulations: for (notice that diverges with ) the distribution decays as a power-law with exponent . Simulations of the QBC model with all memes having fitness agree with this analytical prediction (see Fig. 2).
By means of the standard argument connecting the exponents of power-law tails for scaling variables (see SM) it is possible to relate with the analogous exponent for the popularity distribution: , where . Simulations yield a value close to , from which , in good agreement with simulations (see SM).
iv.2 Pure drift
The opposite limit for the FP equation (6) is the pure drift case, which always holds in the large limit, as , unless and :
This equation describes a deterministic motion
i.e., the meme position drifts exponentially toward ; in other words the systematic drift attracts walkers toward the absorbing boundary. This introduces an additional exponential cutoff in the lifetime distribution, which can be globally written as
in agreement with simulations (see Fig. 2, inset).
iv.3 Average over the fitness
In the original definition of the QBC model the fitness is a random variable uniformly distributed between and . Using Eq. (16) it is possible to compute the lifetime distribution also in this case, by averaging over (see SM) and obtaining, in the limit :
The exponent of the lifetime distribution is then , in reasonable agreement with Fig. 1. A similar conclusion can be drawn for the popularity distribution, predicted to decay as .
In summary, by means of a mapping of QBC dynamics onto a random-walk description, we have derived expressions for for the lifetime and popularity distributions, which account for the phenomenology observed in numerical simulations.
V A macroscopic approach
The microscopic approach allows to determine the dependence of the average lifetime on the fitness and hence estimate the average number of memes with given in the steady state. However, the same quantities can be derived much more easily by a simple approach of mean-field type, focused directly on the temporal evolution of the . For simplicity we assume that fitness values are discretized in classes and, again, that the degree of each agent is 1. We define as the average number of memes with fitness present in the system at time . This quantity changes over time because of two possible gain and two possible loss processes. The creation of a new meme, occurring at rate , increases by 1 with a probability (if the created meme has exactly fitness ), but it may also reduce by 1 if the agent creating the new meme forgets a meme of fitness . This last event occurs with probability . The transmission of an existing meme, occurring at rate , increases if the transmitting agent has a meme with fitness in her feed (probability proportional to ) and the meme is selected (probability proportional to ). Overall the normalized probability of the event is . Finally also the transmission event may lead to an agent forgetting a meme with fitness with probability . The temporal evolution of the is then given by the set of coupled equations
which conserves the total number . Straightforward numerical integration of Eq. (18) allows to determine the stationary values of the and hence of the densities , where is the average number of memes with fitness if all classes were populated uniformly. The comparison with the outcome of numerical simulations (see Fig. 3)
confirms a satisfactory agreement.
In this paper we have studied the model for information diffusion recently introduced in Ref. Qiu et al. (2017). We have been able to derive analytically the lifetime distribution and other properties for a simplified version of the dynamics, which reproduces the phenomenology of the original model.
Our treatment of the QBC model allows to understand how broad tails in the lifetime and popularity distributions, observed empirically, arise. A power-law distribution with an exponent is in agreement with the observations of Ref. Qiu et al. (2017), where hashtags are used to identify Twitter memes. On the other hand, other studies using hashtags give quite different results from the QBC model predictions. In Ref. Weng et al. (2012) a power-law decay for meme lifetime has been observed, with an exponent . This value is not far but distinct from the value predicted by the QBC model in the case of uniformly distributed fitness. Moreover, the strong correlation between meme lifetime and popularity (see SM) is not observed in Twitter data Baños et al. (2013); Gleeson et al. (2016), even if proxies different from hashtags are used to identify memes González-Bailón et al. (2011). A stringent empirical validation of models of online information spreading is itself a difficult task because of the apparent lack of universality. Referring to Twitter data, the identification of memes as URLs leads to a lognormal distribution of popularity Lerman and Ghosh (2010), the analysis of retweet cascades leads to a size distribution with exponent Wegrzycki et al. (2017) with possibly an exponential cutoff Vosoughi et al. (2018) and reply trees give Nishi et al. (2016). Looking at other data sources, the landscape is even more varied: the popularity distribution, estimated from Facebook data, exhibits a power-law deacy with exponent Cheng et al. (2014), while popularity-lifetime correlations are shown to be different between Digg and Youtube data Szabo and Huberman (2010). One could easily change, within the QBC model, the fitness distribution to achieve a better agreement with these observations. In any case it is clear that the QBC model is a gross oversimplification of the real meme diffusion process in online social media. To make the QBC dynamics less unrealistic several hypotheses underlying the present version of model could be lifted. Some of them, such as a nonuniform fitness distribution or a nonlinear dependence on of the probability of selecting a meme, can be easily treated within the present analytical approach. Other fundamental generalizations, such as agent-dependent values of and or heterogeneous rates of individual activation, can be investigated by means of straightforward numerical simulations. One of the ingredients adding realism to the QBC dynamics is the consideration of agents that do not accept in their feeds (and thus do not spread further) memes they have already seen in the past. The effect of this long-term memory is briefly discussed in the SM, but the main result is the change of the popularity and lifetime distributions, that lose their power-law tail. At a more general level, one of the weak points of the QBC model is its insensitivity with respect to changes of the contact pattern topology. While this feature allows our mean-field approach to be successful, empirical data contradict this result: one of the main pieces of evidence is the existence of influential spreaders, i.e. users which, because of their position in the social network have a disproportionate effect on meme dynamics Bakshy et al. (2011); Baños et al. (2013); Borge-Holthoefer et al. (2012). The investigation of increasingly sophisticated models for information spreading and the comparison with the ever larger body of empirical data available remains a challenging avenue for future research.
- Howell et al. (2013) L. Howell et al., WEF Report 3, 15 (2013).
- Del Vicario et al. (2016) M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi, Proceedings of the National Academy of Sciences 113, 554 (2016).
- Vosoughi et al. (2018) S. Vosoughi, D. Roy, and S. Aral, Science 359, 1146 (2018).
- Kwak et al. (2010) H. Kwak, C. Lee, H. Park, and S. Moon, in Proceedings of the 19th International Conference on World Wide Web (ACM, New York, NY, USA, 2010), WWW ’10, pp. 591–600.
- Lerman and Ghosh (2010) K. Lerman and R. Ghosh, in in Proc. 4th Int. Conf. on Weblogs and Social Media (ICWSM) (2010), pp. 90–97.
- González-Bailón et al. (2011) S. González-Bailón, J. Borge-Holthoefer, A. Rivero, and Y. Moreno, Scientific Reports 1, 197 (2011), article.
- Bakshy et al. (2011) E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts, in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (ACM, New York, NY, USA, 2011), WSDM ’11, pp. 65–74.
- Baños et al. (2013) R. A. Baños, J. Borge-Holthoefer, and Y. Moreno, EPJ Data Science 2, 6 (2013).
- Cheng et al. (2014) J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec, in Proceedings of the 23rd International Conference on World Wide Web (ACM, New York, NY, USA, 2014), WWW ’14, pp. 925–936.
- Nishi et al. (2016) R. Nishi, T. Takaguchi, K. Oka, T. Maehara, M. Toyoda, K.-i. Kawarabayashi, and N. Masuda, Social Network Analysis and Mining 6, 26 (2016).
- Pramanik et al. (2017) S. Pramanik, Q. Wang, M. Danisch, J.-L. Guillaume, and B. Mitra, Social Network Analysis and Mining 7, 41 (2017).
- Wegrzycki et al. (2017) K. Wegrzycki, P. Sankowski, A. Pacuk, and P. Wygocki, in Proceedings of the 26th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2017), WWW ’17, pp. 569–576.
- Castellano et al. (2009) C. Castellano, S. Fortunato, and V. Loreto, Rev. Mod. Phys. 81, 591 (2009).
- Buchanan (2007) M. Buchanan, The social atom (Bloomsbury, New York, NY, USA, 2007).
- Wu and Huberman (2007) F. Wu and B. A. Huberman, Proceedings of the National Academy of Sciences 104, 17599 (2007).
- Weng et al. (2012) L. Weng, A. Flammini, A. Vespignani, and F. Menczer, Scientific Reports 2, 335 (2012).
- Gleeson et al. (2014) J. P. Gleeson, J. A. Ward, K. P. O’Sullivan, and W. T. Lee, Phys. Rev. Lett. 112, 048701 (2014).
- Gleeson et al. (2016) J. P. Gleeson, K. P. O’Sullivan, R. A. Baños, and Y. Moreno, Phys. Rev. X 6, 021019 (2016).
- Qiu et al. (2017) X. Qiu, D. F. M. Oliveira, A. Sahami Shirazi, A. Flammini, and F. Menczer, Nature Human Behaviour 1, 0132 (2017).
- Pastor-Satorras et al. (2015) R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani, Rev. Mod. Phys. 87, 925 (2015).
- Szabo and Huberman (2010) G. Szabo and B. A. Huberman, Communications of the ACM 53, 80 (2010).
- Borge-Holthoefer et al. (2012) J. Borge-Holthoefer, A. Rivero, and Y. Moreno, Physical review E 85, 066123 (2012).