Learning Sentimental Influences from Users’ Behaviors
Abstract
Modeling interpersonal influence on different sentimental polarities is a fundamental problem in opinion formation and viral marketing. There has not been seen an effective solution for learning sentimental influences from users’ behaviors yet. Previous related works on information propagation directly define interpersonal influence between each pair of users as a parameter, which is independent from each others, even if the influences come from or affect the same user. And influences are learned from user’s propagation behaviors, namely temporal cascades, while sentiments are not associated with them. Thus we propose to model the interpersonal influence by latent influence and susceptibility matrices defined on individual users and sentiment polarities. Such lowdimensional and distributed representations naturally make the interpersonal influences related to the same user coupled with each other, and in turn, reduce the model complexity. Sentiments act on different rows of parameter matrices, depicting their effects in modeling cascades. With the iterative optimization algorithm of projected stochastic gradient descent over shuffled minibatches and Adadelta update rule, negative cases are repeatedly sampled with the distribution of infection frequencies users, for reducing computation cost and optimization imbalance. Experiments are conducted on Microblog dataset. The results show that our model achieves better performance than the stateoftheart and pairwise models. Besides, analyzing the distribution of learned users’ sentimental influences and susceptibilities results some interesting discoveries.
5
1 Introduction
Collective opinions concisely form by a repeated process that a user who sees and agrees with a sentimental content, forwards, shares, or “Likes” to feed her virtual community, resulting in a temporal cascade of users’ behaviors. As such, users who have taken actions become infective to others in their communities, encouraging their communities to a certain extend to take the same action by interpersonal influences. Each pair of users has a specific influence, especially when there exits some kind of social relation [19, 2]. Therefore, both opinion formation [39, 5] and viral marketing [31, 27] see the importance of learning sentimental influences between users, with which one can better model the dynamics of cascades [19], and maximize influence [24, 14, 6].
Among the existing works, it seldom sees an effective one for estimating sentimental influence between pairs of users as far as we are concerned. Most of studies focus on the process how users repeatedly update their opinions, the consensus they can reach [10, 34, 5], and opinion influence maximization [14], assuming that the interpersonal influences as edge weights are given or equally assigned. In the related domain of information propagation, influences learning has been studied, without consideration of different propagation behaviors on sentiments, although some studies modeled them on the distribution of content topics [29, 11]. Moreover, Goyal et. al [19] “learned” interpersonal influences by counting the successful propagation pairs. And with Bernoulli or Jaccard Index model, they estimated the influences as propagation probabilities. However, there usually record the times of users getting infected, while such successful propagation pairs that user infects user are rarely observed, or hardly traced. It limits the application of such a method. And NetInf [19] used an exponential or powerlaw form of incubation time between a pair of infected users to estimate the interpersonal influence, with empirically assigned parameters. Afterward, more novel models were proposed to learn interpersonal influence by maximizing the likelihood of observed cascades, which were proved to be more effective [33, 15, 18]. However, they used a free scalar parameter directly defined on a pair of users to represent the interpersonal influence. On one hand, the parameters are independent, even if the influences are acted by or apply to the same user. On the other, such a pairwise parameter cannot be trained if there are no observations of propagations between the user pair, users and do not appear in the same cascade, infected in Cascade1 and Cascade2 respectively, as Figure 1 shows. In that case, even though and have a relationship, and , and form a social triangle, a zero or some empirically small constant is assigned, implying that it is never or seldom to successfully propagate between them in the future. In another way, Aral and Walker [2] proposed to model the interpersonal influence by engineered features and corresponding linear coefficients learned for individual users other than user pairs directly. But users’ properties may not be available or easy to extract in other applications.
To fill the blank of previous works, we thus propose to learn distributed representations of users’ influences and susceptibilities on sentiments. With such representations, we model the interpersonal influences and their decaying with the elapsed time in the hazard function of survival model, and maximize the likelihood of the observed the behaviors of users taking or not taking the actions in sentimental cascades. Hence, the interpersonal influences, between two pairs of users, and with the same acting or applied user, can be coupled due to the corresponding representation of the same user. For example, the interpersonal influence of users to , couples with that of users to by the corresponding representation defined on user as in Figure 1. Besides, it requires much fewer parameters, , for users, instead of parameters for user pairs, beneficial to reducing the model complexity, and in turn combating the overfitting problem of assigning an empirical propagation probability. Moreover, the number of infected users in a cascade is usually much less than that of uninfected ones as negative cases. Considering all the negative users prevents the model from being applied to a real large dataset and balancing the optimization. Thus a negative sampling is employed to consider the expectation of negative cases instead, emphasizing the frequently infected users in other cascades. Finally a minibatch Stochastic Gradient Decent (SGD) algorithm with Projected Gradient (PG) is designed to learn the model, and Adadelta is used to adjust the learning rate adaptively. In such a scheme, negative sampling is repeated in each iteration with a small number of samples each time, to approximate the expectation.
A set of cascades with different sentiments are collected from Microblog, covering a group of users who interact at a frequent level. Comparing with the stateoftheart models, including above Bernoulli and Jaccard Index estimation methods, and pairwise models, our model achieves better performances on the tasks of predicting cascade dynamics, “who will be retweeted”, and cascade size with users’ representations. Besides, it can be seen that learning influences separately on different sentimental polarities, mostly benefits the performances on both tasks, even if more parameters are brought in. As last, users’ representations on different sentiments are analyzed as well. And we find that users may have different influences on different sentiments, and are susceptible to different polarities. The “original influentials” are creative to post original attractive messages, while the “secondary influentials” gain their influence credits by hunting and advertising for interesting messages that already exists in the system.
The rest of the paper is organized as follows. Section 2 studies the existing and related works, and the motivation and our model are described in section 3, which the parameter learning algorithm is given. At last, experiments and result analysis are reported in section 4, and section 5 concludes the whole work.
2 Related Work
Sentiment propagation and opinion formation have attracted many research works. [40, 25] experimentally showed that users’ sentiments were influenced by that of others surrounding them on LiveJournal dataset and Facebook dataset separately. [4] used Granger causality analysis to show that sentiment change of audiences were related to the landscape of popular users in Twitter. As for modeling opinion dynamics, successful models were proposed, including Sznajd model [35], Deffuant model [9], and Hegselmann and Krause model [21], which produced agreeing results. Moreover, [32] extended Sznajd model to complex networks. Deffuant et al. [9] modeled the process of opinion dynamics that randomly select two users, and change their opinions to reduce the difference. [22] modeled to change users’ opinions according to the arithmetic average of that of their neighbors, and Fortunato et al. [13] extended the model with multidimensional opinion vector, instead of a scalar opinion value. Besides, Suchecki et al. [34] studied Voter model in scalefree network, smallworld network, and random network. A recent work [5] by Bindel et al. discovered that traditional models including DeGroot model [10] finally converged to a state of consensus under a set of general conditions, while it is rare in real opinion dynamics. Hence it proposed to model with users’ intrinsic beliefs in a game theory, which counterbalanced the opinions at Nash equilibrium. In addition, Gionis et al. [14] studied the overall positive opinion maximization problem, adopting the game model [5] of opinion dynamics. [1] modeled that a user’s opinion was generated from her latent opinion distribution, based on selfexcited Hawkes process influenced by her neighbors.
The body of above works is mostly on opinion dynamics and maximization, assuming the sentimental influences between connected users were equal. It does not confirm to our observations in real life, in which a minority of influential users infect an exceptional number of their peers [23], and there are a mass of easily influenced users [39]. Thus, as a fundamental problem, sentimental influences were ever estimated by counting under Bernoulli assumption, or a threshold rule [39]. As far as we know, an effective method of learning sentimental influences from users’ behaviors remains unexplored.
Nevertheless, there were quite a few successful works on estimating interpersonal influences in the related domain of information propagation. Some of them made efforts to extract features that are related to propagation probability and learned from the observed information cascades. Crane et al. [8] measured the response function of information propagation dynamics in social systems with endogenous and exogenous factors. Artzi et al. [3] predicted whether a user would respond to or retweet a message, get influenced, by classifying with demographic and content features. In a way other than feature extractions, Tang et al. [36] proposed topic factor graph (TFG) to model the generative process of the topiclevel social influence on large networks, by finding a topic distribution for each user. And [29] proposed a probabilistic factor graph to model the direct and indirect influences between adjacent and nonadjacent users of heterogeneous network. Saito et al. [33] learned the propagation probability between neighbors of a directed network under independent cascade model, using the orders of users getting influenced as training data. And Goyal et. al [19] proposed to estimate the interpersonal influences in a counting manner, with assumptions of Bernoulli model and Jaccard Index separately. They estimated the influences as propagation probabilities. NetInf [16] adopted both exponential and a powerlaw incubation time models with fixed parameters as pairwise probability to infer the underlying network. Besides, there are also a series of works learning a propagation probability between any pair of users with survival model and its variants to infer underlying networks with the transmission rates. NetRate [15] used survival theory to model transmission rate between every pair of users, which was viewed as an edge weight for the pair. And [17] then modeled the hazard rate in survival model with additive and multiplicative risks separately to improve the performance of cascade size prediction. Afterward, InfoPath [18] was proposed to learn timevarying transmission rates for user pairs as the edge weights of the hidden dynamic network. Taken together, these methods work in a pairwise manner, , they learned the propagation probability between pairs of users, fundamentally different from the proposed method in this paper which focuses on inferring userspecific influence and susceptibility from historical cascades. The features in influence and susceptibility representations were analyzed by [2], showing that propagation probability were determined by the two feature vectors, and learned the correlations between users’ attributes to identify influential or susceptible users. [38] then proposed a sequence model to learn a user’s latent representation of influence and susceptibility, based on the orders of users’ getting infected. In this work, we propose to learn the distributed representations of a user on sentiments, and continuous time model is employed to consider the infected times of users and the effect of the elapsed time on interpersonal influences, rather than their orders only.
3 Learning Sentimental Influence
A cascade is the snapshot of a propagation process, recording the times that users take actions on the same target, such as a piece of information, or product. Users taking actions become infected, and may influence others since such actions are publicly visible or pushed to the related users on purpose, by the online service. Thus we define a cascade for actions on a target as a temporal sequence
where is the user who take the action at time , and is the total number of infected users, cascade size. Since social networks are not always available or existing in many applications, such as blogs, Yelp, Youtube, and online shopping, to make our model generally applicable, network structures are ignored. That is to say, the influence between any pair of users is modeled, and a very small value of influences can capture the underlying disconnections of the user network, and vice versa. Moreover, in the following we can see that our model can honor social network as well in the objective. In addition, a special time is defined as the biggest time window which we observe cascade in, namely the time when we take the snapshot for cascade .
3.1 Motivations
Interpersonal influences are quite different especially when existing some social relationships. Most existing works intuitively model the interpersonal influences in a pairwise manner with independent variables to learn, assuming that interpersonal influence between different pairs of users are independent from each other, even if the influences are related to a common user. Such an overfitting problem becomes severe, when there is not observed any propagation between a pair of connected users. Taking Figure 1 as an example, two cascades {, , , } and {, , } are observed. It is not seen that user is infected before user did, even though there is a social link from user to user . In such a case, most existing models took the propagation probability or transmission rate between them as zero, or some empirically small value [20], implying that it would never or seldom see successful propagation between the two users in the future. Nevertheless, with the witness of propagation from users to in one cascade, and from users to in another, user probably influences user like the triangle pattern in friendship relations. Thus, with the distributed representations of influence and susceptibility defined for every user, interpersonal influences can be correlated by the shared representations of the same user. As shown in the example of Figure 1, the influences between user pairs and are coupled with a shared representation of user ’s influence. And the interpersonal influence from user to can be intuitively estimated by the learned representations of ’s influence and ’s susceptibility, other than a small empirical constant or zero.
At last, not all the users take actions in a cascade, so let the total number of users be , and there are always a large number of users immune to a contagion, , , who are treated as negative cases and informative to reflect the interpersonal influences from the infected users to them. Without network constants, considering all the uninfected users takes much more computational costs, even unable to tackle. Moreover, the severe imbalanced positive (infected) cases and negative (uninfected) cases make the negative likelihood dominate the optimization of the whole objective, losing focus on positive cases as the following.
where superscript is used to show that the values are related to cascade . It is seen that the right term in summation easily dominates the objective, since is relatively very large. So we use sampled users as negative cases. Nevertheless, the infected frequency of a user indicates how easily she could get infected again. Thus observing a frequently infected user immune to a contagion, provides more information in the likelihood. And sampling negative cases from the distribution of users’ infected frequencies is then a better choice for learning influences.
3.2 Survival Analysis Model
We begin to briefly introduce the preliminary knowledge on Survival Analysis Model [26, 17]. We consider the happening time of a user taking the action as a continuous random variable, defined over . Let and denote the probability density function (p.d.f) and the cumulative density function (c.d.f.) separately. And the probability . So, the probability of a user not taking the action until time is defined by the survivor function
A hazard function is defined as the instantaneously infecting rate in time interval , where is an infinitesimal elapsed time, given a user survives until time .
Noticing that and , the survivor function can be expressed as
3.3 Modeling sentimental cascades
With the analysis above, we model the interpersonal influence by two nonnegative matrices and defined on each user , where is the number of sentiment classes, and is the dimension of users’ representations on each sentiment class. For a message with sentimental opinion, we define a onehot vector with dimensions, representing its exclusive sentiment class. Thus, for a cascade with sentiment , the transmission rate function from users to , is defined as equation (1), which indicates the likelihood of successful propagation between them. Although the original concept of transmission rate is not necessarily between 0 and 1, we scale it for regularization.
(1) 
where matrices and are parameters to separately capture the influence of user and the susceptibility of user . Let denote the set of parameters for simplification. With transmission rate , we can define the hazard function from users to in Survival Analysis Model, at time as follows.
(2) 
where depicts the hazard function monotonously decaying with the time elapsed from , and adding 1 avoids unbounded hazard rate due to a zero or infinitesimal value of . Noticing that equation (2) holds only when , we define hazard rate , when , namely, user has not been infected at time . Moreover, we can consider social network by defining hazard function as well, if user and user are not connected.
And then the survivor function of user surviving later than time and under the influence of user , satisfies
(3) 
Finally, the probability density function of user happening (getting infected) at time , given user happening (infected) at time is calculated as follows.
With the assumption that a user is only infected by one of the previously infected ones [17], the likelihood of user , being infected at time in a cascade is
(4) 
So the joint likelihood of observing the whole cascade, given the user firstly taking the action at time is
Considering the negative cases that users are not infected at the end, the probability of user surviving later than time is
And the loglikelihood of a cascade is as follows, considering negative cases.
There are a large number of negative users comparing to the number of infected ones in a cascade. Maximizing the likelihood of all the negative cases limits the scalability of our model, and the imbalance between positive and negative cases may mislead the optimization direction. Thus we sample users as negative cases according to the distribution [30], where is the frequency of user infected in cascades. It is worth noticing that sampling negative cases are repeated in every optimization iteration to honor the expectation. To give a direct understanding of the likelihood, the dependencies are concisely represented in Figure 2.
Finally, the optimization problem of learning users’ sentimental influences and susceptibilities
(5a)  
s.t.  (5b) 
where superscript is used to indicate that the value or function are related to cascade .
3.4 Optimization
Optimization algorithm is the key to learn the distributed representations of users’ influences. First of all, the gradients of transmission rate function (1) on and are matrices.
where only the th row in both matrices can have nonzero gradients, when a cascade belongs to the th sentiment class, , . Furthermore, if user get infected in a cascade, , the gradients of the loglikelihood on matrix may have nonzero gradients. And the gradients of the loglikelihood on matrix may be nonzeros, if user is infected and , or she is a negative user. Otherwise, the gradients are always zeros. As the negative cases for a cascade is repeatedly sampled in every iteration, we define as the set of negative users at the th iteration of algorithm for cascade .
where is the set size.
Therefore, the gradients of the objective function (5a) on matrices and are as follows.
where is an indicator function, outputting 1 if the argument is true, and 0 otherwise. and are matrices, containing partial derivations of objective function (5a) on each elements of matrices and separately.
The framework of Stochastic Gradient Decent (SGD) over shuffled minibatches is employed for efficient optimization. The minibatch size is set 12 cascades. In order to solve the nonnegative constraints on parameters, Projected Gradient (PG) [28] is used to adjust the gradients. Let the parameter updates be and for each user . And matrices are the concat of and for all users . is the user count as defined previously. Thus the updates will be reduced by a rate , namely, and , if the following condition does not hold.
(6) 
where means the parameter in the th iteration. With , is the simplified representation of objective function (5a). is the trace of a matrix, and is a constant between 0 and 1.
Moreover, since deciding learning rate is not trivial, so we choose Adadelta [41] to adaptively tune the learning rate. Let be decay rate and be a small constant. The accumulate gradients are
And with the definition of function , the update values are calculated as
Let the project function be defined as projecting into nonnegative space, namely, if ; otherwise . Therefore, the algorithm of learning users’ sentimental influences is listed in Algorithm 1.
4 Evaluations
Microblog data is used to evaluate our model. To make the application more general, we assume that the retweeting relations and following relations are not available in the evaluations, only keeping the temporal sequence of users taking actions, , retweet, and their infected times as the dataset. We then demonstrate the performance of our model at the wellknown tasks, by comparing to the stateoftheart models, and the learned sentimental influences are analyzed as well.
4.1 Data Description
Several strategies are taken to collect Microblog data from Sina Weibo
Afterward, in a way of “onion peeling”, we repeated to delete for each cascade, the records of the users with activeness less than 5, and so did those of the users retweeting them. In each iteration, the cascades of sizes less than 8 are deleted as well, since very short cascades are considered as accidents.


With such a heuristic way, we finally get a set of cascades over a virtual community of active users from Oct 31, 2013 to Mar 3, 2014. As listed in Table 1(a), there are 6,219 users, and 44,021 cascade records totally. The number of cascade messages with positive emoticons is 325, and the number of those with negative ones is 412, keeping a balanced observations for learning sentimental influences. And Figure 3 illustrates the distributions of top frequently used emoticons in the messages of cascades, indicating their positive sentiments or negative sentiments. Furthermore, the median and mode values of the distribution of users’ activeness are 5 and 4 separately as in Table 1(b), indicating that users’ behaviors are not rarely observed in our dataset to guarantee a successful learning. And it also gives the median and mode of the distribution of cascade sizes as well, showing the sufficiency of involved users in a cascade. The cascades in the dataset are evenly split into 10 groups, and 10fold cross testing are used for evaluations, alternatively with 9 of 10 groups as training, and the remaining one as testing.
4.2 Evaluation Models
In the experiments, we choose the following models for comparison.

CT Bernoulli and CT Jaccard models [19]: They are continuous time models that the propagation probability from user (infected) to decays with the elapsed time. For a fair comparison, we use the same decaying function, , , and the same assumption that a user is only infected by one of the infective users. CT Bernoulli model assumes that an initial propagation probability follows Bernoulli distribution, i.e. the fraction of number of successful propagation over the total number of trials, from one user to another. And CT Jaccard model defines an initial propagation probability in a form of Jaccard Index, which is the number of successful propagation divided by the total number of cascades with at least one infected between a pair of users. Since there only observes a temporal sequence of users getting infected in training dataset, we assume that successful propagation takes place from every earlier infected users to the current one.

NetRate[15]: It directly define a scalar parameter as interpersonal influence between a pair of users, and learned them with Survival model. Since Jaccard Index was reported as a better estimator of propagation probability [19], we use Jaccard Index to initialize the transmission rates at the beginning of the learning stage, to get a better fine tune.

CT LIS: We ignore the differences of latent influence and susceptibility on sentiments of cascade messages, and define two dimensional vectors, and , for transmission rate function instead. Such parameters were ever defined by [38], which used a static way to model the orders of users’ behaviors. So we use “CT LIS” to indicate our upgrade version for continuous time model.

Sent LIS: It is our model that learns sentimental influences considering all the negative cases. And we use “Sent LIS (neg sample)” to indicate ours with negative sampling.
\Xhline1.2pt  CT Bernoulli  CT Jaccard  NetRate (Jaccard)  CT LIS  Sent LIS  Sent LIS (neg sample)  

MRR  Average  0.0062  0.0064  0.0071  0.0196  0.0216  0.0265 
SD  0.0029  0.0036  0.0038  0.0039  0.0033  0.0044  
AUC  Average  0.8732  0.8621  0.8718  0.8793  0.8992  0.8983 
SD  0.0658  0.0802  0.0730  0.0207  0.0152  0.0156  
\Xhline1.2pt 
\Xhline1.2pt  CT Bernoulli  CT Jaccard  NetRate (Jaccard)  CT LIS  Sent LIS  Sent LIS (neg sample)  

Acc  Average  0.1221  0.3000  0.3005  0.4123  0.3840  0.3980 
SD  0.0365  0.0964  0.0961  0.0874  0.1255  0.1392  
MRR  Average  0.2592  0.4349  0.4354  0.4696  0.4822  0.4920 
SD  0.0703  0.1275  0.1273  0.0876  0.1269  0.1348  
\Xhline1.2pt 
4.3 Tasks and evaluation metrics
The following tasks are used to evaluate the effectiveness of our learned sentimental influences and the improvements comparing to the other models. And the metrics for each task are introduced as well.
PCD: predicting cascade dynamics. The happening times and infected users of cascade dynamics are both predictable by our model. However, in order to make the task simple and easy to evaluate, we design the task that aims at predicting whether a user will be infected at a given time , knowing the previous truth, the users who have been infected, and their happening times before time . Thus on one hand, the task can be treated as a set of binary classification problems, and we evaluate the results, with the infected users as the positive cases, and finally uninfected users until time as the negative ones. As for the positive cases, the likelihood of an infected user at given time , is given by . The likelihood of a negative user , if she had been infected right after the positive ones, , at time , would be calculated as , where is a very small constant. Thus with the likelihood values for all the users, true positive (TP) rate and false positive (FP) rate can be calculated given any threshold. And then AUC (the area of under the ROC curve) can be evaluated as [12], where ROC is drawn with TP rate and FP rate as the coordinates.
On the other hand, given time , and the observation of cascades before that time, we can calculate the infected likelihood for candidates , by . Thus with ranking the candidates with their likelihood values, the top ones are the most probably infected, and a wellperformed model can give a high rank to those users happened at the moment. In such a way, Mean Reciprocal Rank (MRR) [37] for rankings at all times of users getting infected in cascades is calculated as the metric.
WBR: who will be retweeted. Microblog users get infected and take actions to retweet the message from one of their followees who posts or retweets it previously. Thus the task predicting “who will be retweeted” is a way to examine interpersonal influence under quantitative understanding. In the scene of multiexposures, high interpersonal influence will have high probability to be forwarded. As such, given , namely, user happened at time , the infective user that retweets is
We therefore deal with the prediction task as a ranking problem of interpersonal influence. The user with higher rank is more probable to be retweeted. We evaluate the prediction performance by metrics of average Accuracy (Acc) of topone prediction and MRR. The ground truth of retweets can be extracted from the content of Microblog messages. Larger values of Acc and MRR indicate better predictions.
CSP: Cascade size prediction. Cascade size prediction, as a key part of influence maximization and viral marketing, is one of the most important applications based on modeling cascade dynamics. In our settings of CSP task, we choose the first users and acting times of each cascade as the initialization, and predict the cascade size at time , , where is the actinig time of the th user . The simulatioin method is used to predict the cascade size by dynaimics models. The prediction time span is evenly splited and marked by time scales. Thus starting after time , an infected user tries to influence an uninfected user at each time scale , with the probability
And if user is infected at time sacle with such a sampling, she will be added as infected users at the following time scales. The simulatioins are repeated, and the average cascade size is reported as the prediction. Thus the predction can be evaluated by mean absolute percentage error (MAPE), where a smaller value indicates a better prediction.
4.4 Evaluation results.
As the description of dataset, we split the whole datasets into 10 groups for cross testing. Thus each experiments are repeated 10 times, and the average metrics and the Standard Deviation (SD) are reported. And the dimension of users’ representations on a sentimental polarity is in the following evaluations for computational efficiency.
PCD: Figure 4 illustrates the ROC curves of the evaluation models for one of the 10fold cross tests. It visually shows that our models “CT LIS”, “Sent LIS” and “Sent LIS (neg sample)” can achieve better performance in the formulation of binary classification. And NetRate with Jaccard Index as parameter initialization improves the performance of “CT Jaccard” model. As for the over all evaluations for 10fold cross tests, Table 2 lists the average results and SDs of all the models, with the best and the second best MRRs and AUCs in bold text. It is seen that our model achieves 0.0216 and 0.0265 in the metric of MRR, overwhelming other models with significance test, pvalue ¡ 0.01. And our negative sampling model get the best, thanks to its effort in balancing positive and negative cases. By examining the results generated from “CT Bernoulli” and “CT Jaccard”, it shows a consistent result that Jaccard Index can beat the Bernoulli model in the estimation of propagation probability, as [19] reported. In the measurement of binary classification, “Sent LIS” and “Sent LIS (neg sample)” both outperforms the others in AUC, which are 0.8992 and 0.8983 separately, with the former achieving a slightly better result. Besides, the machine learning model NetRate can further tune the Jaccard Index to achieves better MRR and AUC values. Most important of all, in both ranking and classification formulations of predicting cascade dynamics, it is worth noticing that pairwise models, namely, “CT Bernoulli”, “CT Jaccard” and NetRate limits their performance, comparing to the proposed models that learning distributed representations of users, showing our advantages in the remission of overfitting and model complexity reduction.
WBR: With the extraction of retweeting relations from retweet content, the evaluation results of WBR task are reported in Table 3, based on the ground truth. The topone accuracies (Acc) and MRRs of all cascades are averaged for the 10fold cross tests as well, and the significance is tested. The bold numbers are the best and second best performances. Again we can see that with distributed representations of users, “CT LIS”, “Sent LIS”, and “Sent LIS (neg sample)” outperforms the pairwise models that suffer from the overfitting problems on unobserved propagation pairs. Compared with NetRate, the three LIS models improves 37.2%, 27.8% and 32.4% separately on accuracies of predicting “who will be retweeted”, while increasing the MRRs by 7.9%, 10.7% and 13.0% separately. Besides, with the comparison among pairwise models, “CT Jaccard” still takes her advantages to “CT Bernoulli” in both metrics, and “NetRate (Jaccard)” is the best of the three, thanks to the machine learning with Survival model. At last, with negative sampling, “Sent LIS (neg sample)” can on one hand balance the positive cases and negative cases, and on the other consider the information of negative cases in an expectation, resulting a better choice of decent gradient. And in turn, it achieves a better performances in both accuracies and MRRs than “Sent LIS”, which testifies the advantages of our model.
\Xhline1.2pt  CT Bernoulli  CT Jaccard  NetRate (Jaccard)  CT LIS  Sent LIS  Sent LIS (neg sample)  

MAPE  Average  0.7199  0.7105  0.7109  0.6259  0.6259  0.6362 
SD  0.0270  0.0333  0.0350  0.0883  0.1458  0.0.2252  
\Xhline1.2pt 
CSP: In the experiments, we choose first infected users as the initialization for prediction, and the times of simulatioins for each cascade is 100 for efficiency. Thus the averaged cascade sizes are reported with 10cross testing for all methods in Table 4. It is seen that “CT LIS”, “Sent LIS” without and with negative sampling outperform other pairwise models, achieving 0.6259, 0.6259 and 0.6362 separately in MAPE. Moreover, compared to the bestperformed pairwise model, we reduce MAPE by more than 10.46%, which shows the advantage of our learned representations of users’ influences in cascade size prediction.
Nevertheless, to show the differences of transmission rates learned by our model “Sent LIS (neg sample)” and pairwise model NetRate, we separately calculate the transmission rates of ours on positive sentiment and negative sentiment, by latent sentimental influence and susceptibility matrices of equation (1). For each pair of users, there is a point with our transmission rate as Xcoordinates, and that of NetRate as Ycoordinates. And we count the number of points falling in each lattice cell, as illustrated in Figure 5 (a) and (b), which cells are colored from cold color to warm color based on the point counts. Thus it is seen that a very warm and long line lying on the Xaxis from 0.1 to 0.4 for both figures of positive sentiment and negative sentiment. It tells that a lot of overfitting transmission rates by NetRate assigning a zero or small constant, can be estimated by the distributed representations of users, which varies between different user pairs. Besides, the higher transmission rates from NetRate can also have a discriminative distribution in the transmission rates of ours, as those horizontally aligned warm cells shows. And the same solution can also be concluded from Figure 5 (c) and (d). All above gives an evidence that our learned sentimental influences have more abilities to discriminate in the influential and the susceptible, resulting good performances in the above evaluation tasks.
4.5 Analysis of users’ sentimental influences and susceptibilities
Besides the comparisons of evaluation models, we investigate our learned distributed representations of users on sentiments, matrices and for each user . For each row in matrices and , it is the representation of user ’s influence and susceptibility on the corresponding sentiment, denoted as “Positive I”, “Negative I”, “Positive S”, and “Negative S”. And we use L1norm of those row vectors to measure the degrees of influence and susceptibility on sentiments. Once more, we construct points of users with those L1norm values as coordinates, and count the number of points falling into a predefined lattice cell. Thus the contour maps are draw accordingly in Figure 6. Figure 6 (a) and (b) are the contour maps of users’ influences v.s. susceptibilities on positive sentiment and negative sentiment respectively. There are two peaks in both contour maps. It is interesting to see the peaks nearby “Positive I” and “Negative I” axises, which show that amount of influential users who are not susceptible to others as [2] claimed. We name them as original influentials in both positive sentiment and negative sentiment. On the other side, there are another part of influential users in the other two peaks located at the upper right of the contour maps, who are susceptible and active to retweet others’ messages, named secondary influentials in both sentiments. In another word, the secondary influentials may take a lot of efforts on retweeting attractive messages to gain their reputations and influences. And the original influentials focus on composing attractive and initial messages for the system. Thus the original influentials are the primitive power of the system to bring new resources, and the secondary influentials are good advertisers to let people get information.
Finally, we show a main peak in the contour maps of Figure 6 (c) and (d) in a 2dimensional view, which give a distribution of users’ influences on positive sentiment and negative sentiment in (c), and that of users’ susceptibilities on both sentiments in (d). From Figure 6 (c), it is seen that users could have higher influences on positive sentiment, while lower ones on negative sentiment, and vice versa, although a certain amount of them have almost the same high influences on both sentiments. Figure 6 (d) gives the similar solution on susceptibilities, which some users are more sensitive to positive sentiments, and others are sensitive to negative ones. And it seems that more users have the same high susceptibilities on both sentiments than whom have the same high influences in the dataset.
5 Conclusions
We propose a model to learn the distributed representations of users’ influences on sentiments from their history behaviors. By explicitly characterizing the sentimental influence and susceptibility of each user with two matrices respectively, the model reduces the complexity of pairwise models, and in turn remits the overfitting problem. We also design an effective algorithm to train the model based on maximizing logarithmic likelihood of information cascades. Adadelta method is used to estimate an efficient learning rate adaptively, and PG method guarantees the constants of nonnegative parameters. Our model does not require the knowledge of social network structure, hence having wide applicability to the scenarios with or without explicit social networks. Explicit social network can be added as indicators in the likelihood of a user getting infected by the connected and infective ones. We evaluated the effectiveness of our model on Microblogging dataset from Sina Weibo, the largest social media in China. Experimental results demonstrate that our model consistently outperforms existing pairwise methods at predicting cascade dynamics, “who will be retweeted”, and cascade size prediction. Moreover, with the analysis of users’ sentimental influences and susceptibilities, we find that there are two peaks in the contour maps, indicating original influentials and secondary influentials. The former only create initial and highquality messages to influence others, while the latter attract others’ attentions by retweeting interesting messages. Besides, users may have different reactions on messages with different sentiments. In the future, we would like to apply the distributed representations of users to more imaginative applications.
6 Acknowledgments
This work was funded by National Grand Fundamental Research 973 Program of China (No. 2013CB329602, No. 2013CB329606), and the National Natural Science Foundation of China with Nos 61572467, 61232010. The authors thank the Crowdsourcing platform (http://www.cnpameng.com/) providing initial Sina Weibo data.
Footnotes
 Sina Weibo (http://www.weibo.com) is the biggest site for Microblog service in China.
References
 Learning opinion dynamics in social networks. arXiv preprint arXiv:1506.05474, 2015.
 S. Aral and D. Walker. Identifying influential and susceptible members of social networks. Science, 337(6092):337–341, 2012.
 Y. Artzi, P. Pantel, and M. Gamon. Predicting responses to microblog posts. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics, pages 602–606, 2012.
 Y. Bae and H. Lee. Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers. Journal of the American Society for Information Science and Technology, 63(12):2521–2535, 2012.
 D. Bindel, J. Kleinberg, and S. Oren. How bad is forming your own opinion? Games and Economic Behavior, 92:248–265, 2015.
 S. Cheng, H. Shen, J. Huang, G. Zhang, and X. Cheng. Staticgreedy: Solving the scalabilityaccuracy dilemma in influence maximization. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pages 509–518, 2013.
 Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 355–362, 2005.
 R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences, 105(41):15649–15653, 2008.
 G. Deffuant, D. Neau, F. Amblard, and G. Weisbuch. Mixing beliefs among interacting agents. Advances in Complex Systems, 3(01n04):87–98, 2000.
 M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974.
 N. Du, L. Song, H. Woo, and H. Zha. Uncover TopicSensitive Information Diffusion Networks. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pages 229–237, 2013.
 T. Fawcett. An introduction to roc analysis. Pattern Recognition Letters, 27:861–874, 2006.
 S. Fortunato, V. Latora, A. Pluchino, and A. Rapisarda. Vector opinion dynamics in a bounded confidence consensus model. International Journal of Modern Physics C, 16(10):1535–1551, 2005.
 A. Gionis, E. Terzi, and P. Tsaparas. Opinion maximization in social networks. In SDM, pages 387–395. SIAM, 2013.
 M. GomezRodriguez, D. Balduzzi, and B. Schölkopf. Uncovering the Temporal Dynamics of Diffusion Networks. In Proceedings of the 28th International Conference on Machine Learning, pages 561–568, 2011.
 M. GomezRodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1019–1028, 2010.
 M. GomezRodriguez, J. Leskovec, and B. Schölkopf. Modeling information propagation with survival theory. In Proceedings of the 30th International Conference on Machine Learning, pages 666–674, 2013.
 M. GomezRodriguez, J. Leskovec, and B. Schölkopf. Structure and dynamics of information pathways in online media. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pages 23–32, 2013.
 A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, pages 241–250, 2010.
 A. Goyal, F. Bonchi, and L. V. Lakshmanan. A databased approach to social influence maximization. Proceedings of the VLDB Endowment, 5(1):73–84, 2011.
 R. Hegselmann, U. Krause, et al. Opinion dynamics and bounded confidence models, analysis, and simulation. Journal of Artificial Societies and Social Simulation, 5(3), 2002.
 M. R. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303–320, 1969.
 E. Katz and P. F. Lazarsfeld. Personal Influence, The part played by people in the flow of mass communications. Transaction Publishers, 1955.
 D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137–146, 2003.
 A. D. Kramer, J. E. Guillory, and J. T. Hancock. Experimental evidence of massivescale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24):8788–8790, 2014.
 J. F. Lawless. Statistical Models and Methods for Lifetime Data, volume 362. 2011.
 J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Transactions on the Web, 1(1), May 2007.
 C. J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10):2756–2779, 2007.
 L. Liu, J. Tang, J. Han, M. Jiang, and S. Yang. Mining Topiclevel Influence in Heterogeneous Networks. October, pages 199–208, 2010.
 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
 M. Richardson and P. Domingos. Mining knowledgesharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 61–70, 2002.
 F. A. Rodrigues and L. DA F. COSTA. Surviving opinions in sznajd models on complex networks. International Journal of Modern Physics C, 16(11):1785–1792, 2005.
 K. Saito, R. Nakano, and M. Kimura. Prediction of information diffusion probabilities for independent cascade model. In KnowledgeBased Intelligent Information and Engineering Systems, pages 67–75, 2008.
 K. Suchecki, V. M. Eguíluz, and M. San Miguel. Voter model dynamics in complex networks: Role of dimensionality, disorder, and degree distribution. Physical Review E, 72(3):036132, 2005.
 K. SznajdWeron and J. Sznajd. Opinion evolution in closed community. International Journal of Modern Physics C, 11(06):1157–1165, 2000.
 J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in largescale networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 807–816, 2009.
 E. M. Voorhees. The TREC8 Question Answering Track Report. In Text REtrieval Conference, 1999.
 Y. Wang, H. Shen, S. Liu, and X. Cheng. Learning userspecific latent influence and susceptibility from information cascades. In TwentyNinth AAAI Conference on Artificial Intelligence, 2015.
 D. J. Watts and P. S. Dodds. Influentials, networks, and public opinion formation. Journal of consumer research, 34(4):441–458, 2007.
 R. Zafarani, W. D. Cole, and H. Liu. Sentiment propagation in social networks: a case study in livejournal. In Advances in Social Computing, pages 413–420. Springer, 2010.
 M. D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.