Interactive Lévy Flight in Interest Space
Compared to the well-studied topic of human mobility in real geographic space, very few studies focus on human mobility in virtual space, such as interests, knowledge, ideas, and so forth. However, it relates to the issues of management of public opinions, knowledge diffusion, and innovation. In this paper, we assume that the interests of a group of online users can span a Euclidean space which is called interest space, and the transfers of user interests can be modeled as the Lévy Flight on the interest space. To consider the interaction between users, we assume that the random walkers are not independent but interact each other indirectly via the digital resources in the interest space. The model can successfully reproduce a set of scaling laws for describing the growth of the attention flow networks of real online communities, and the ranges of the exponents of the scaling are similar with the empirical data. Further, we can infer parameters for describing the individual behaviors of the users according to the scaling laws of the empirical attention flow network. Our model can not only provide theoretical understanding on human online behaviors, but also has wide potential applications, such as dissemination and management of public opinions, online recommendation, etc.
Keywords: human mobility, Lévy Flight, collective attention, interest space, attention flow networks
Everything is moving. To understand the mobility patterns for human kinds is of great importance since it relates to epidemics[1, 2], urban planning[3, 4], and other issues in modern city[5, 6, 7, 8, 9, 10]. Lots of studies on human mobility in real space have been made in past decades[11, 12, 13]. For instance, it is found that Lévy Flight[14, 15, 16, 17], one of the most famous random walk model which is significantly distinguished from Brownian motion, can be used to characterize human movements. However, human mobility does not only take place in real space exclusively, but also in virtual space[19, 20]. For example, our consciousness always jumps between different ideas, which can be understood as a virtual movement in interest space. A large amount of users surfing on an online community, and jumping between different posts, can be also understood as collective movements in interest space[22, 23, 24, 25]. Although virtual space seems to be less solid than physical space, the study of it is of great significance because it may help us to understand the issues of psychology therapy, dissemination and management of public opinion, online recommendation, and so on.
Some important conclusions for collective users in online community have been achieved. For example, conventional studies of human dynamics usually concerned the statistics on a single interest (digital resource) such as access time, frequency, and so forth, or the distributions of the number of visiting pages and the decay patterns of popularity. Besides, human being is a social animal. Thus, a large amount of attention is paid on how people interact, correlate, and connect each other in the studies of social networks[31, 32, 33, 34]. The interaction and correlation between people can also be reflected by statistical laws[11, 12, 13]. For example, the super-linear scaling of productivity is found in both cities and online communities[4, 21], which means cooperation between people may promote the growth of the per capita productivity for human organizations in a faster rate than their sizes. While, the widely existed sub-linear scaling law of diversity of places or interests indicates a slower increase of diversification. This is also an emergent indirect results of interactions between people.
Although human dynamics and complex networks have drawn a wide attention, little concern is made with the sequential movements of users on interests because the concept of the interest space is not apparent. Recently, some attempts have been made to visualize the virtual space by attention flow network model[35, 36], on which nodes represent digital resources (posts, tags, and articles etc.). The network is constructed according to the collective behaviors of a large number of users. However, the attention flow network is built according to the data of users[22, 37], i.e., the representation of the interest space rather than the space itself.
In this paper, we focus on collective movements of users in their interest space. Here, we assume that all the possible interests of users span a Euclidean space, in which adjacent points standing for similar interests, and users performing random walks of Lévy Flights in the space. However, naive Lévy Flight model[14, 16] can not reproduce the required scaling laws because of the absence of interactions. Thus, we build an interactive Lévy Flight model, in which, interactions occur between users indirectly bridged by digital resources. The model successfully reproduces all the concerned scaling laws, and the range of scaling exponents can be also calibrated by adjusting the parameters in the model. Our model can not only deepen our understanding, but also may largely improve the accuracy of predictions on user behaviors[20, 38], leading to wide applications on recommendation[29, 39], searching, user profiling, etc.
To understand the scaling phenomena of production and diversity, and the relationship between users and digital resources, we construct an interactive Lévy Flight model to simulate user behaviors. Letâs consider an online community (such as Baidu Tieba, Stack Exchange or Flickr, etc.), which containing a large number of registered users. All their interests can span an interest space which is modeled by a 2-dimensional Euclidean plane (see figure 1-(A)). In which, each cell represents one possible interest, such as a kind of music style, or a type of article, etc. Two cells are adjacent standing for they representing similar contents. Meanwhile, articles, tags, Q&As, and so forth digital resources generated by users can be projected onto this interest space. We use to denote the number of units projected at and time , where is the coordinate of the cell. The cell occupied by at least one unit of digital resource, i.e., , is called an Active Site meaning that the resource has the corresponding theme as the interest. In figure 1-(A), , ,…, represent active sites and occupied by digital resources. When a digital resource is generated by a user, it will be projected on the space and be able to be visited or read by other users. Thus, all users interact each other indirectly through the active sites.
Usersâ sequential behaviors such as browsing, posting, Q&A, and others in a session can be viewed as a walk in the interest space. We assume that the user’s walk satisfies the 2-dimensional Lévy Flight law, meaning that the movement pattern of a user is basically random and the probability density distribution function of the movement distance in one jump is a power law:
This movement rule describes how a userâs interest transfers: user’s interest will stay in a narrow area for a long time, but occasionally implement a long-range jump with a small probability. (values are in [1,3]) is the exponent of the Lévy flight, it characterizes how wide are the interests of users. If is small, then the users may frequently perform long range random jump, representing they have wide interests, and vice versa. Thus, is a parameter of a trade-off between familiarity and novelty. Users usually consume some familiar topics, but they also require some new information occasionally to visit. In figure 1-(A), the arcs labeled with same numbers represent the flights of one user. For example, the user with label travels , , and sequentially.
Next, we consider the interaction between users. We know that if a community already has abundant digital resources (such as many posts in a forum), users will continue to visit these resources. Otherwise they will lose their interests and quit quickly. In order to characterize this feature, without loss of generality, we assume that the user can jumps continually and randomly from a cell as long as is active (there are at least one unit of digital resource), otherwise, he(she) will go out of the space from . We denote the position of user at time is and it is when the user quit the community, so we have
where, is a random number following equation (1). On the other hand, each user will generate new digital resource with a certain probability in the process of random walk. That is, if the user jumps to the cell , he(she) will add a new digital resource at with a certain probability . Thus, we have:
where, is the Dirac delta function, it equals to only if its component is ; and is a random number following distribution with a probability to be , and is the total number of users who performs Lévy Flights in the space.
Next, we consider the situation with users. Suppose in one simulation (a session), users are set on the origin of the interest space, and they begin to implement Lévy Flights from the origin simultaneously. Although they donât interact directly, they can interfere each other via the active sites. The simulation ends at time when all users exit the space. Apparently, will increase with implying that the indirect interactions can keep users living in the community for longer time since the probability that they encounter each other increases. One trajectory can be generated for one user in his(her) lifespan in the community.
To observe the collective behaviors of these Lévy Flighters, we construct an attention flow network as shown in figure 1-(B). The so called attention flow network is an open flow network in which nodes representing digital resources (active sites) and weighted directed links representing transition flows between two nodes formed by the collective behaviors of the random walkers. In the model, the weight of the edge connecting active sites and can be defined as:
Two special nodes are added to represent the environment which are the source and the sink. When a random walker starts to jump to an active site in the interest space, a unit of flux from the source to the node is added to the attention flow network. On the other hand, a unit of flux from node to the sink is added if the last site of a random walkerâs visit is . The attention flow network can characterize the collective properties of a large number of users for both the simulated model and the empirical data.
To validate our model, we study how the network properties will change as the size of the system increases, and to see if the same scaling laws can be reproduced by our model. Here, we use the total number of users as the measure of the size. In fact, this quantity is also the total influx to the attention network for a given simulation. Following that, we will focus on how the macroscopic properties change with .
Here, we focus on three basic macroscopic variables. First,
measures the activity of the community, it is defined as the total number of transitions of interests (jumps). Second,
is the total number of active sites, or the total number of nodes in the network. It measures the diversity of interests for all members in the community. Third,
is the total number of edges of the network, and it measures the diversification of interests transitions. According to our simulated data, all three variables scale to with different exponents, i.e.,
Where , , and are exponents characterizing the relative growth speed of the quantities to the size of the system as shown in figure (2).
To compare with the simulated data, we also plot the empirical scaling laws for the same quantities on three representative online communities, Baidu Tieba (each jump represents a click behavior, see figure 3-(a,b,c)) and Stack Exchange(each jump represents an answering behavior, see figure 3-(d,e,f )).
We observe that all the communities follow the same scaling laws as the simulated results, and the values of exponents are also similar. First, we notice that the exponents are always larger than 1.0 for different values in simulations. This observation also holds for empirical data. As shown in figure 4-(a, b), we systematically calculate the exponents of Baidu Tiebas and communities in Stack Exchange and plot the distribution of exponents.
It is clear that the distribution of the four exponents are nearly normal distribution, in which is right skew and the average value of is approximately which is larger than significantly. Some small Tiebasâ exponents are less than since their scaling properties are not statistically significant.
We further confirm the super-linear relationship between and for more online communities as shown in table 1. All the exponents are larger than . Among which, the communities with intensive interactions between users always have larger exponents like Baidu Tieba, Stack Exchange, and Digg.
|Sites||Baidu Tieba||Stack Exchange||Delicious||Flickr||Yelp tip||Digg|
Actually, we can understand the exponent as an indicator to measure the intensity of the social interactions between users for one community. According to (1) we derive:
That indicates the average number of jumps increases with the size of the system if . And the relative speed increases with . Therefore, if is big, the average activities generated by the users will be sensitive to the total number of users. This characterizes the nonlinearity of the interactions of users.
To compare, we investigate the possible intervals of for our simulations. As shown in Fig.5, when the exponent increases as the probability , which means as the propensity that user generating activities increases, the average intensity of interaction also increases.
If we understand the activity as a kind of production of users, then the exponent characterizes the productivity of the bunch of users. If it is easy for users to express their interests ( increases), the online community is more productive.
Next, we analyze the exponent of , the scaling between the number of nodes of the attention flow network and the size of the system. This scaling law indicates how the diversification of the digital resources generated by the users changes with the size. We found both for simulation(see figure 3) and empirical data(see figure 5), the exponents are significantly less than one which indicates a sub-linear scaling between diversity and the size of the system. This sub-linear scaling is always observed in other complex systems.
The total number of edges on the attention flow network measures the diversification of distinct transitions between pairs of nodes. However, there is a large deviation for the exponent . Super linear and sub linear are both possible for different communities. There is a transition from sub-linear to super-linear for simulations.
When increases, both exponents ( and ) for diversification increase. That means the propensity that user generating contents can accelerate the relative speeds that diverse contents are produced compared to the size of the system. Thus, the average distinct contents generated increase with the size of the system. It is interesting to observe another scaling behavior between and ,
a super-linear can be observed. This phenomenon is observed for a large number of networks which is named as densification phenomenon. Our model can successfully reproduce this phenomenon and the exponent fluctuates around . All the ranges of exponents for models are consistent with the ones in empirical data, which implies that our model can capture the scaling behaviors in data.
We further test how the exponent of Lévy Flight influences the other exponents. The results are shown in figure 5. The qualitative characteristics of the dependence between the exponents and do not change dramatically, however, the range of exponents change is different.
We also note that the range of the fluctuation of is relatively small for different , but changes dramatically with different exponent . Thus, we guess that the exponent of exclusively depends on , and we suppose that this dependence can be used to infer the value of for a real community.
4 Inferences for parameters and
Next, we will infer the parameters and from empirical exponents by using maximum likelihood principle for each community. We suppose the real exponents , , and are random sampled from the model. And the exponents follow normal distributions with centers determined by the model and standard deviation for given and , that is:
where, , , and represent the exponents generated the model for given parameter and which can be read from the dependency of figure 5. To infer and from given empirical measure of , , and , we attempt to maximize the likelihood probability (eq.14 ), that is:
So, we need to minimize the distance:
That is, we should find the most probable parameters and so that the simulated exponents are closest to the empirical ones.
In figure 6 (a, b), we show all the inferred parameters for Baidu Tieba (a) and communities of Stack Exchange (b). We notice that all the Tiebaâs can be roughly separated into two groups according to their parameters, and they have similar value () but different values. We know that indicates how dissimilar of the usersâ interests for one transition. Thus, the users in Tieba with small always have relatively wide interests. All the Tiebasâ have very small values meaning that the tendency for posting a new thread is much less than clicking. While, the communities in Stack Exchange almost concentrate in the area of or . That means the users in Stack Exchange always have wide interest and do not like to post questions. However, compared to Tieba, Stack Exchange communities always have larger values meaning that it is easier for asking a question compared to answering it than for posting a thread compared to clicking threads.
In this paper we build an interactive Lévy flight model to simulate the random walk behaviors of users in virtual interest space. We assume the users can interact indirectly via the digital resources. Two important parameters controlling the Lévy flightâs behavior, i.e., how wide are users interests and the propensity that a user deliver a post determine the structures of the attention flow network. We compare the statistical properties of the attention flow network with empirical online communities from the perspective of scaling laws. Four different scaling laws characterize how the macroscopic quantities of activity, diversification of resources generated by users, and the diversification of interests transfer scale to the number of users. And the exponents characterize the relative growth speed. All the scaling behaviors and the range of exponents in simulation are in accordance with the empirical data. We then can infer the two important parameters and if the exponents
Therefore, the interest transition of users may be characterized by a simple random walk model on a 2-dimensional space spanned by the interests of users. The key that may explain the origins of the scaling laws that we have observed for the empirical communities is the indirect interactions between users. In our model, we assume that the users may stay in the system only if they can find the published digital resources which can feed their interests. This is the key to the indirect interaction and the super-linear scaling law of activity because when the number of users increases, the interactions between users also increase but in a faster rate.
This work does not only provide theoretical understanding of online communities, but also implies potential applications. First, the scaling exponents can be treated as novel indicators to characterize the growth of communities. For example, the exponent may indicate the level of interactive stickness of a community since it increases with the intensity of the interactions between users. The merits of adopting the exponents to quantify the communities include the stability of the exponent and the independency on the size of the community. Therefore, we can make a reasonable evaluation of a forum or a community when it is small.
Second, we can infer the parameters from the measured exponents. All these parameters describe the behaviors of users. Thus, our work makes it possible to infer the individual behavior only from their macroscopic performance of collective. And it is also possible that we can imply the macroscopic behavior if we know the individual parameters.
Third, we pave the path to connect the mobility between real and virtual worlds. Our model shows that human mobility in the virtual world may also follow the same statistical law as in the real world. And the interactions between people may play an important role.
Finally, drawbacks exist for the current model. First, we only provide an indirect evidence for the mobility in the virtual world. However, the space may not be 2-dimensional or even Euclidean. Second, the model simplifies the human behaviors in a large extent, this may not work if other factors need to be considered. Third, more empirical data should be collected to test our model.
We gratefully acknowledge funding support from the National Natural Science Foundation of China (grant 61673070), the Fundamental Research Fund for the Central Universities(grant 310421103) and the Beijing Normal University Interdisciplinary Project.
- Grenfell B T, BjÃ¸rnstad O N and Kappey J 2001 Nature 414 716
- Belik V, Geisel T and Brockmann D 2011 Physical Review X 1 3103–3106
- Ratti C, Frenchman D, Pulselli R M and Williams S 2006 Environment and Planning B: Planning and Design 33 727–748
- Lathia N, Quercia D and Crowcroft J 2012 Pervasive Computing 7319 91–98
- Wang P, GonzÃ¡lez M C, Hidalgo C A and BarabÃ¡si A L 2009 Science 324 1071–6
- Ratti C, Sobolevsky S, Calabrese F, Andris C, Reades J, Martino M, Claxton R and Strogatz S H 2010 Plos One 5 e14248
- Lathia N and Capra L 2011 Mining mobility data to minimise travellers’ spending on public transport ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 1181–1189
- Yuan J, Zheng Y and Xie X 2012 Discovering regions of different functions in a city using human mobility and pois ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 186–194
- Cacciapuoti A S, Calabrese F, Caleffi M, Lorenzo G D and Paura L 2012 Ad Hoc Networks 10 1520–1531
- Santi P, Resta G, Szell M, Sobolevsky S, Strogatz S and Ratti C 2013 Proceedings of the National Academy of Sciences
- Brockmann D, Hufnagel L and Geisel T 2006 Nature 439 462–5
- GonzÃ¡lez M C, Hidalgo C A and BarabÃ¡si A L 2009 Nature 458
- Song C, Koren T, Wang P and BarabÃ¡si A 2010 Nature Physics 6 818–823
- Viswanathan G M, Afanasyev V, Buldyrev S V, Murphy E J, Prince P A and Stanley H E 1996 Nature 381 413–415
- Lomholt M A, Tal K, Metzler R and Joseph K 2008 Proceedings of the National Academy of Sciences of the United States of America 105 11055
- Raposo E P, Buldyrev S V, Luz M G E D, Viswanathan G M and Stanley H E 2009 Journal of Physics A Mathematical and Theoretical 42 434003
- Rhee I, Shin M, Hong S, Lee K, Kim S J and Chong S 2011 IEEE/ACM Transactions on Networking 19 630–643
- Bartumeus F, Catalan J, Fulco U L, Lyra M L and Viswanathan G M 2002 Physical Review Letters 88 097901
- Lambiotte R, Blondel V D, Kerchove C D, Huens E, Prieur C, Smoreda Z and Dooren P V 2008 Physica A Statistical Mechanics and Its Applications 387 5317–5325
- Liang H, Silva R N, Ooi W T and Motani M 2009 Multimedia Tools and Applications 45 163–190
- Wu F and Huberman B A 2007 Proceedings of the National Academy of Sciences 104 17599
- Wu L, Jiang Z and Min Z 2014 Plos one 9 e102646
- Sasahara K, Hirata Y, Toyoda M, Kitsuregawa M and Aihara K 2013 Plos One 8 e61823
- Kenett D Y, Morstatter F, Stanley H E and Liu H 2014 Plos One 9 e102001
- Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P and Ratti C 2014 Cartography and Geographic Information Science 41 260
- Li J, Theng Y L and Foo S 2014 Cyberpsychology Behavior and Social Networking 17 519–527
- Etter V, Kafsi M, Kazemi E, Grossglauser M and Thiran P 2013 Pervasive and Mobile Computing 9 784–797
- Watts D J and Dodds P S 2007 Journal of Consumer Research 34 441–458
- Smith D, Menon S and Sivakumar K 2005 Journal of Interactive Marketing 19 15–37
- PÃ¤âUn C, Bratianu C, Pinzaru F and Zbuchea A 2016 Management Dynamics in the Knowledge Economy 4 125–140
- Liben-Nowell D and Kleinberg J 2003 The link prediction problem for social networks Twelfth International Conference on Information and Knowledge Management pp 556–559
- Mislove A, Marcon M, Gummadi K P, Druschel P and Bhattacharjee B 2007 Measurement and analysis of online social networks ACM SIGCOMM Conference on Internet Measurement 2007, San Diego, California, Usa, October pp 29–42
- Kumar R, Novak J and Tomkins A 2010 Structure and evolution of online social networks Link Mining: Models, Algorithms, and Applications pp 611–617
- Cho E, Myers S A and Leskovec J 2011 Friendship and mobility:user movement in location-based social networks ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, Ca, Usa, August pp 1082–1090
- Shi P, Huang X, Wang J, Zhang J, Deng S and Wu Y 2015 Plos One 10 e0136243
- Zeng W, Fu C W, Arisona S M, Schubiger S, Burkhard R and Ma K L 2017 IEEE Transactions on Intelligent Transportation Systems PP 1–14
- Frank M R, Mitchell L, Dodds P S and Danforth C M 2013 Scientific Reports 3 2625
- Yuan T, Cheng J, Zhang X, Liu Q and Lu H 2015 Knowledge-Based Systems 88 70–84
- Pennacchiotti M and Gurumurthy S 2011 Investigating topic models for social media user recommendation International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April pp 101–102
- Cole M J, Hendahewa C, Belkin N J and Shah C 2015 Acm Transactions on Information Systems 33 1
- Middleton S E, Shadbolt N R and Roure D C D 2004 Acm Transactions on Information Systems 22 54–88
- Guo L, Lou X, Shi P, Wang J, Huang X and Zhang J 2015 Physica A Statistical Mechanics and Its Applications 437 235–248