Independence of Sources in Social Networks
Online social networks are more and more studied. The links between users of a social network are important and have to be well qualified in order to detect communities and find influencers for example. In this paper, we present an approach based on the theory of belief functions to estimate the degrees of cognitive independence between users in a social network. We experiment the proposed method on a large amount of data gathered from the Twitter social network.
Cognitive dependence, Theory of belief functions, Twitter social network, Independence measure.
Online social networks are online platforms that connect users. They have gained a lot of interest and popularity over the last decade. Many people rely on social networks particularly on information, news and opinions shared by users on diverse subjects.
An online social network, such as Twitter, helps users to share subjective information reflecting their personal opinions. In fact, in a social network, users become sources of information who produce different kinds of information (opinions, facts, news, rumors, etc.). However, some users are cognitively dependent on others. In addition, an online social network enable its users to interact with each other by several activities such as sharing, quoting, or commenting other users’ posts. These users’ interactions provide insights for the cognitive dependence/independence relationships among users in a social network. A user is supposed to be cognitively dependent on another user if he relies on and adopts information that he provides.
The aim of this paper is to study dependencies of sources in social networks. Information about sources’ dependencies in a social network can be used to detect related groups, communities , ….
The identification of communities can help for targeted marketing. It can also be used for influence propagation  to promote new products and define new marketing strategies. Indeed, a company wishing to launch a marketing campaign or a new product can use relations of dependencies to speed up the propagation.
In this paper, we propose an approach to estimate the degrees of independence/dependence between users of a social network. Twitter is chosen as an example of a directed social network; thus, we detail the proposed measure using Twitter vocabulary. The dependence relationship between users is an oriented relation; therefore, Twitter is very appropriate to illustrate our approach.
The proposed approach is based on the theory of belief functions to estimate uncertain degrees of independence between users. The theory of belief functions is used to asses uncertain degrees of belief on the independence of users. This theory is also chosen thanks to the great number of combination rules that merge subjective information.
The remainder of this paper is organized as follows: Section recalls some basic concepts of the theory of belief functions; Section details the proposed approach to estimate degrees of independence/dependence. Finally, Section presents an experimental study of our approach before concluding in section .
2 Theory of belief functions
The theory of belief functions, also called Dempster-Shafer theory, was first introduced by Dempster  and mathematically formalized by Shafer . This theory models imprecise, uncertain and missing data.
In the theory of belief functions, a frame of discernment, noted , is a set of exhaustive and mutually exclusive hypotheses . only one of them is likely to be true.
The power set, , enumerates sub-assemblies of . It includes not only hypotheses of , but also, disjunctions of these hypotheses.
The true hypothesis in is unknown; thus, a degree of belief is assessed to subsets of reflecting our degree of faith on the truth of each subset of .
A basic belief assignment (bba), also called mass function, is noted and defined such that:
The mass represents the degree of belief on the truth of . When , is called focal element.
In the theory of belief functions, decision is generally made using pignistic probabilities . The pignistic probability, noted , is deduced from as follows:
where is the number of hypotheses which train it.
In the theory of belief functions, combination rules are proposed to merge distinct mass functions in order to produce a more reliable information. It consists on building an unique mass function by combining several elementary mass functions arising from multiple distinct sources of information.
Dempster’s rule of combination
 is the first rule that merges several mass functions provided by distinct and independent sources. The combination of two mass functions and provided by and is given as follows:
The reliability of an evidential information is not always insured. In fact, an evidential data can be supplied by a partially reliable or an unreliable source. In order to take the source’s reliability into account, its beliefs are discounted proportionally to its reliability. Let be the reliability of a source and a mass function provided by . The discounting of produces defined by:
3 Uncertain Measure of Independence in Twitter
Many researches are focused on measuring the independence in several social networks. Leenders  proposed and approach focused on the opinions and attitudes of users in a social system. These opinions and attitudes are shaped by social influence. The proposed approach depend partially on individual characteristics.
Kudelka et al.  makes use of the measurement of dependence between the network vertices for the detection of communities in social networks.
To predict a user actions (behaviors) in a social network, Tan et al.  consider diverse factors: the influence from his friends, the correlation between users’ actions and his historic behaviors. They conducted an experiment on Twitter and they found that more friends perform the action, a user also tends to perform the action and the likelihood that two friends perform an action at the same time is always larger than the likelihood that randomly two users perform the same action at the same time.
Jendoubi et al.  propose to detect influencer in Twitter using the theory of belief functions. They consider three Twitter metrics to quantify the influence between users: followers, mention, retweet.
Twitter is a social network that enables its users to establish many types of relation between them. A relation between users of Twitter may be a follow, a retweet, a mention or a citation.
These ties are considered as dependence indexes for the several reasons: First, the retweet actions represent the amount of information tweeted by a user from the tweets of another user. This amount reflects the degree of adoption of the opinions of other users.
Then, the mention represents the quantity of messages directly sent to other specific users in order to establish direct communications with them. These actions reflect the importance of a part of the Twitter users and their ideas for other users in the network.
Finally, the citation represents the degree of reliance of some users on other users by citing them in their tweets.
Therefore, we consider that degrees of dependence between users of Twitter can be deduced from numbers of follows, retweets, mentions and citations. In this paper, we propose to estimate degrees of cognitive dependence between users of Twitter. Two users are cognitively dependent when information provided by a user are affected by the information produced by the other one. We note that the cognitive independence is matter of researches in the theory of belief functions . Two variables  are assumed to be cognitively independent with respect to a belief function if any new evidence that appears on only one of them does not change the evidence of the other variable. In addition, two sources  are cognitively independent if they do not communicate and if their evidential corpora are different. Two sources are either positively are negatively dependent; in the case of negative dependence, sources are dependent but their ideas are different. Otherwise, influencers  are sources that have a maximum of impact in the ideas of others. Dependence and influence measures are different but quite similar. Thus, the dependence measure may be used for influence maximization.
A user in Twitter is cognitively dependent on another user if is following and frequently retweets tweets of or/and, frequently mentions in his tweets.
Figure 1 shows the proposed approach to estimate independence of users in Twitter. The proposed approach is in steps:
In the first step, weights are estimated. Thus, we define a weight for each aspect of dependence: retweet, mention and citation.
In the second step, the independence estimation. In this step, we use the theory of belief functions to (i) model each independence aspect, (ii) to combine them and (iii) to make a decision regarding the independence of users.
3.1 Step : Estimation of weights
In Twitter, a user following a user can retweet, mention or/and cite . Each information about the retweet, mention or/and citation may reflect the dependence or the independence of on . Thus, a vector of weights is assigned to each link as shown in figure 2. Note that is following and the vector of weights will be used to learn the independence/dependence of on .
Let be the social network where is the set of nodes, is the set of links, is a follower of in Twitter. The weights , and of the link are estimated using the following measures:
The retweet weight, , is the weight defining the number of times that has retweeted the tweets of ; is the number of tweets of that were retweeted by and is the total number of retweets of .
The mention weight, , is the weight defining the number of times that mentioned in his tweets; is the number of tweets of in which was mentioned and is the total number of mentions of .
The citation weight, , is the weight defining the number of times that quoted the tweets of ; is the number of tweets of who have been quoted by and is the total number of citations of .
3.2 Step : Independence estimation
The dependence estimation is based on the defined weights. Let be a directed graph where is the set of weights’ vectors, such that is the weight vector associated to the link . The independence estimation process is in three basic steps:
In the first step, a mass function is built from each weight on the link. Let be the frame of discernment of the independence where is the hypothesis that users are dependent and is the hypothesis that users are independent. Mass functions are estimated as follows:
First, the retweet weight justifies our belief on the independence of users. Therefore, is defined as follows:
Note that is a discounting coefficient that takes into account the total number of tweets . The mass function is more reliable when the number of retweets is enough big in comparison with the total number of tweets. For example, assume that a user has posted twenty eight tweets in two weeks and that among these tweets there are ten retweets, seven of them are from . Without discounting using , the value of will be equal to which does not reflect the reality. In fact, the number of tweets that has retweeted represents only the quarter of the total number of tweets of .
Then, a mass function is deduced from the mention weight as follows:
Where is a discounting coefficient. The discounting coefficient is used to take into account the total number of tweets quoted by with respect to the total number of tweets of .
Finally, the mass function is deduced from the citation weight as follows:
Where is a discounting coefficient that takes into account the total number of tweets of mentioning with respect to the total number of tweets of .
Then, mass functions , and are combined with Dempster’s rule of combination as follows:
Finally, degrees of independence and dependence corresponds to pignistic probabilities computed from the combined mass function as follows:
The dependence degree is non-negative, it is either positive or null. It is also normalized. In fact, the degree of dependence is a degree that lies in the interval . When , is totally dependent on ; implies that is totally independent of . Decision is made according to the maximum of pignistic probabilities. If then is dependent on , in the opposite case, if , is independent from .
The proposed approach is tested on data collected from Twitter; because it is a directed social network that provides a large number of messages published per day. Unlike other social media platforms like Facebook, the content of Twitter is public and accessible via programming interfaces. In our experimental study, we used the Twitter streaming API through a Python library called Tweepy. This library provides access to Twitter data via its programming interface, Twitter API. The Twitter Streaming API allows retrieving data in real-time. It allows also filtering tweets by several keywords or according to their geographical position. In our case, we are interested in collecting tweets written by specific users. For this purpose, we filtered tweets by a list of users IDs. We crawled Twitter data for the period between 05/06/2017 and 13/8/2017. We get an important number of tweets (205271 tweets) corresponding to 10350 users on this period. Experiments of the proposed approach detailed in this section are made on a large number on users, tweets, retweets, mention and citation as detailed in table 1. Note that retweets, mentions and citations are considered as tweets.
Table 2, shows that there are independent relationship between a part of users despite there are a follow relationship between them. For example, the user is independent from the user and the same for the user with with a lower degree of dependence. All experiments are made on real data described on table 1 which are collected from Tweeter. Users are numbered to respect the anonymity and privacy. Therefore, the follow relationship in Twitter does not necessarily imply the cognitive dependence between users. In an explicit way, a user who follows another user in Twitter can be either cognitively independent or dependent on .
Link The degree of dependence Table 2: Examples of independence relationship
Table 3 shows that in the case where a user is dependent on a user , is not necessarily dependent on . In the case where a user is independent on a user , is not necessarily independent on .
Link The degree of dependence Table 3: Examples of asymmetrical relationships
Table 4 shows that if users and are mutually independent or dependent, degrees of independence or dependence are not necessarily equal.
Link The degree of dependence Table 4: Examples of mutual independence/dependence with different degrees of independence/dependence
Tests are made on data collected from 05/06/2017 to 13/08/2018 as detailed in table 1. Degrees of independence and dependence are computed of each pair of users from the . Thus, degrees of independence and dependence are computed for each couple of users for all the users. Note that for each couple of users we compute and . Therefor values of independence are computed. In the complete graph there are nodes, each node represents a user and values of independence for each couple of users. For tests, we have also estimated the degree of independence/dependence for users without any relationship of follow.
The dependence graph of figure 3 is a part of the complete graph. In figure 3, only users from the users are represented. These users are randomly chosen for simplicity seek and also to have a readable graph. Black links represent a follow link, the bold part links reflects the direction of follows. In other words, is following and ; is following ; is following ; is following , and ; is following , and ; is following , and ; is following and ; is following ; is following and finally is following , , and . Note that , , , , are mutually following each other.
Figure 3 shows that some users are cognitively dependent, for example is dependent on with a degree ; is dependent on and ; is dependent on and ; is dependent on and ; finally is dependent on S10 and .
Finally, , , , , , , , , , , , , ,, and are independent. Note that and , and , and are mutually independent.
|Users||The degree of dependence|
Table 5 shows that users without any follow are independent. For example there is no follow between and because is not following and is not following . Users and are mutually independent. Users that are not following others are independent. When a user is not following another user , is necessarily independent from .
Studying cognitive independence relationship among the Twitter social network users is a very important research topic since this online social network is widely used to post and share information. In fact, quantify the degrees of dependence between users can be very useful to disseminate information to the largest number of users which is a very important thing in many fields such as marketing.
Most of existing works that try to study the dependence between users in a social network, use only the network structure to measure the dependence of a user on another user and ignore many interesting dependence aspects. Nevertheless, the dependence measures that is based only on the network structure is not adequate to quantify the dependence between sources. In fact, in the twitter social network, a user can follow another user in the network without being necessarily cognitively dependent on him.
In this work, we propose an approach based on the theory of belief functions for measuring the dependence degrees between users in Twitter. We consider three dependence aspects witch are the retweets, the mentions and the citations and we use the Dempster-Shafer theory to model each dependence aspect, to combine them with taking into consideration the conflict that can arise between them and to make a decision with regard to the dependence a user on another user in the network.
The results of the experimental study of our proposed approach show that the follow relationship in twitter does not necessarily imply the cognitive dependence between users and that the more the number of retweets, citations or/and mentions increase, the more the degree of dependence of a user on an other user increases and vice versa. It shows also that the dependence relationship between two users is not necessarily mutual and the dependence degrees between them are not necessarily equal.
As a future work, we will use our approach to detect communities in social networks.
-  Milos Kudelka, Pavla Drázdilová, Eliska Ochodkova, Katerina Slaninová, and Zdenek Horak. ”local community detection and visualization: Experiment based on student data”. In Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011), Prague, Czech Republic, August, 2011. Springer, pages 291–303, 2011.
-  Siwar Jendoubi, Arnaud Martin, Ludovic Liétard, Hend Ben Hadji, and Boutheina Ben Yaghlane. Two Evidential Data Based Models for Influence Maximization in Twitter. Knowledge-Based Systems, 2017.
-  A. P. Dempster. Upper and lower probabilities induced by a multiple valued mapping. The Annals of Mathematical Statistics, 1967.
-  G. Shafer. A mathematical theory of evidence. Princeton University Press, 1976.
-  P. Smets. Decision making in the tbm: the necessity of the pignistic transformation. International Journal of Approximate Reasonning, 2005.
-  R. Leenders. Modeling social influence through network autocorrelation: Constructing the weight matrix. Social Networks, 24:21–47, 01 2002.
-  Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social action tracking via noise tolerant time-varying factor graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pages 1049–1058, New York, NY, USA, 2010. ACM.
-  Mouna Chebbah, Arnaud Martin, and Boutheina Ben Yaghlane. Combining partially independent belief functions. Decision Support Systems, 73:37–46, 2015.