Influence Maximization via Representation Learning
Abstract
Although influence maximization has been studied extensively in the past, the majority of works focus on the algorithmic aspect of the problem, overlooking several practical improvements that can be derived by datadriven observations or the inclusion of machine learning. The main challenges lie on the one hand on the computational demand of the algorithmic solution which restricts the scalability, and on the other the quality of the predicted influence spread. In this work, we propose IMINFECTOR (Influence Maximization with INFluencer vECTORs), a method that aspires to address both problems using representation learning. It comprises of two parts. The first is based on a multitask neural network that uses logs of diffusion cascades to embed diffusion probabilities between nodes as well as the ability of a node to create massive cascades. The second part uses diffusion probabilities to reformulate influence maximization as a weighted bipartite matching problem and capitalizes on the learned representations to find a seed set using a greedy heuristic approach. We apply our method in three sizable networks accompanied by diffusion cascades and evaluate it using unseen diffusion cascades from future time steps. We observe that our method outperforms various competitive algorithms and metrics from the diverse landscape of influence maximization, in terms of prediction precision and seed set quality.
Influence Maximization via Representation Learning
George Panagopoulos École Polytechnique george.panagopoulos@polytechnique.edu Michalis Vazirgiannis École Polytechnique and AUEB mvazirg@lix.polytechnique.fr Fragkiskos D. Malliaros CentraleSupélec and Inria Saclay fragkiskos.malliaros@centralesupelec.fr
Introduction
With the advent of the social web, social media have been broadly utilized to spread news, establish trends or even shape public opinions. This has motivated a substantial amount of research on the analysis of how social media users effect each other. Traces of such influence can be detected in the users sharing activity, such as tweets, Facebook posts, Instagram stories, etc. Formally, social influence is defined as a directed measure between two users and represents how possible is for the target user to adapt the behavior or copy the action of the source user. The research on social influence can be divided into two main branches that address different problems: influence maximization and influence learning.
Influence maximization is typically formulated as a combinatorial optimization problem where the aim is to find the set of nodes in a network that would maximize the reach of a diffusion cascade starting from them (?). Although the problem was inspired by viral marketing, it can be applied to several other disciplines, such as limiting misinformation (?) and opinion polarization (?). Influence maximization algorithms face scalability issues primarily because they compute the number of nodes infected by a set of seed users (influence spread) with simulations of diffusion models. Apart from the computational hinder, diffusion models rely on several assumptions that may guide the algorithm in unreliable results (?). In other words, most of these solutions depend on the structural information and oversimplified social contagion models, overlooking how diffusions really take place over the network. However, along with new datasets that contain diffusion cascade logs (?; ?), emerged the question of how to measure influence in a datadriven manner, which leads to the other branch of research approaches.
Influence learning has attracted significant attention in the past few years. Initially, it comprised primarily of methods that were learning parameters that define the influence probabilities of diffusion models (?), or the edge probabilities themselves (?). However, these approaches suffered from overfitting, as the number of parameters was proportional to the number of edges. To overcome this, one can either focus on learning the diffusion model using hyperparameters (?) or utilize representation learning to capture influence between two nodes. A representative example is EmbeddIC (?), an adaptation of the independent cascade where the influence probabilities are expressed as a combination of the nodes’ embeddings. Each node is associated with two embeddings, a source and a target, which combined construct the probability of influencing and getting influenced. This idea has been intertwined with neural networks to develop models that can be used to predict the evolution of a diffusion cascade such as the next infected node (?; ?) and the time of the next infection (?; ?) or whether the cascade will become viral or not(?).
In this work, we attempt to utilize the advantages of representation learning for the service of influence maximization. To this end, we start by analyzing the behavior of influencers in a large scale social network to quantify the importance of certain factors relevant to an influencer’s success. The conclusions of this brief analysis lead us to the creation of the input to our machine learning model, INFECTOR (INFluencer vECTORs). INFECTOR is a novel neural architecture that aims to embed in continuous vectors the amount of influence exerted between influencers and nodes as well as the influencers’ overall success in spreading information. Furthermore, we develop a heuristic algorithm, called IMINFECTOR, that performs Influence Maximization using the representations derived from INFECTOR. The quality of the retrieved seed set is determined by a set of unseen diffusion cascades from future time steps, similar to a train and test split in machine learning (?). We deem this datadriven evaluation strategy more reliable than traditional evaluations based on simulations of diffusion models because it relies on actual traces of influence. This is the first time, to the best of our knowledge, that representation learning is used for influence maximization with promising results. Our contributions can be summarized as follows:

Analysis of influencer activity that provides two new insights: successful influencers mostly start diffusion cascades rather than participate in them, and their cascades tend to be faster than those of mediocre users.

The INFECTOR model: a novel multitask learning neural network that captures simultaneously the influence relationships between nodes and the aptitude of a node to create massive diffusion cascades.

Reformulation of the classical influence maximization as a weighted bipartite matching problem using the diffusion probabilities.

The IMINFECTOR algorithm: a method that uses the aforementioned representations to produce a seed set for influence maximization.

A comparison of both, networkbased and diffusionbased metrics and algorithms in three sizable realworld datasets.
Reproducibility: Implementation of IMINFECTOR, code to reproduce the analysis, and links to the real world datasets used in the experiments are open^{1}^{1}1https://github.com/GiorgosPanagopoulos/IMINFECTOR.
Background and Related Work
We provide an overview of influence maximization and network representation learning background, as both are essential into the proposed methodology. Starting with the latter, node2vec (?) is a popular representation learning technique that capitalizes on the structure of the network to derive node vectors (embeddings) that translate the structural correlations of the nodes into a real space. The model is an unsupervised neural network with one hidden layer that receives a node as input and outputs a probability distribution over the rest of the network’s nodes. It is trained using nodecontext pairs, where the context is a list of nodes that occurs in random walks starting from the node under consideration. The intuition is that if a node appears in the context of another node, they exhibit similar structural properties and thus their embeddings must be close. A similar method, called inf2vec, has been developed to capture influence relationships between two nodes (?). In this case, the contexts are derived by real diffusion cascades and nodecontext pairs are constructed using a combination of random sampling and random walks. This process quantifies influence relationships between nodes that are connected in a network, similar to the aforementioned literature (?).
Influence maximization is the problem of identifying a set of nodes in a network such that if an infection starts from them, it would reach the maximum possible number of nodes. The greedy solution starts with an empty seed set and adds iteratively the node that provides the best marginal gain, i.e., the maximum increase of the set’s influence spread (?). Influence spread is the number of nodes infected by a given seed set and is computed by simulating the progress of a diffusion cascade through the network with models such as the Independent Cascade or the Liner Threshold. Instead of running simulations of these stochastic processes, the algorithm estimates the influence spread based on the liveedge model, a Monte Carlo sampling of the edges that under many repetitions converges to an unbiased estimate of the influence spread. The function of the influence spread is submodular, which means that as the seed set increases the contribution of a node to the influence spread can only decrease or stay the same. Due to this, the algorithm is guaranteed to reach a solution that approximates the optimal. However, the influence spread estimation is still prohibitively time consuming for realworld networks, which has lead to algorithms based on sketching and reverse reachable sets (?; ?), as well as fast heuristic methods (?). An important problem with this general approach is that it relies on the oversimplistic diffusion models and the influence probabilities are derived by the weighted cascade model or uniformly at random. This calls into question the quality of the retrieved seed set, as more and more studies indicate that structure alone can not capture the properties of influence and diffusion (?; ?). There has been a number of attempts that examine influence maximization using both, the network and diffusion cascades (?), but overall, it is a largely overlooked field.
Influencer Representation Learning
Context Creation
An important question in influence analysis is whether successful sharing or resharing contributes more to an influencer’s popularity. In other words, does it suffice for an influencer to create massive cascades, or she has to participate in cascades started by others as well? The sampling process of previous influence learning models assumes the participation of a node in other cascades is important because nodecontext pairs are derived for each node in the diffusion cascade (?). Although this is a more complete approach, it is too timeconsuming, given that the average size of a cascade can surpass 60 nodes in today’s datasets. More specifically, creating the propagation network, which is a realization of the network (e.g. follow edges) under the observed diffusion cascade (e.g. retweets) and is a standard practice in datadriven influence maximization (?), requires looping through each node in the cascade and iterating over the subsequent nodes to search for a directed edge in the network, which has complexity , where is the number of cascades and is the average cascade size. This procedure is unavoidable because the true influence paths are missing from the data.
In an effort to overcome this burden, we examine the question posed above in the diffusion cascades of the Sina Weibo dataset, a large scale social network accompanied by retweet cascades. Each cascade represents a tweet and its set of retweets. In our case, we have separated the final year of the dataset in train and test cascades based on their time of occurrence. We keep the 18,652 diffusion cascades from the last month of recording as a test set and the 97,034 from the previous 11 months as a train set. We call them test and train set, similarly to machine learning, because we will use them for the evaluation of our algorithm, as we describe in the experimental section of the paper. To evaluate our hypothesis that successful influencers mostly start diffusion cascades rather than participate in them, we will first rank all seed users found in the test set based on three measures of success: the number of test cascades they spawn, their cumulative size and the number of Distinct Nodes Influenced (DNI), which is the set of nodes that participated in them (?; ?). We separate all users into three categories for each metric, based on how high they rank on it. Then, we compute for each category the total cascades they start in the training set oppose to those they simply participate in. As is visible in Figure 1, users that belong to the top category of the test cascades in all metrics are far more likely to initiate a cascade than participate in it.
With that in mind, we propose a new approach to create nodecontext pairs to serve as input to a neural network for influence embeddings. Instead of building the propagation network and deriving context for each node in it, we will compute only the context of the source node, i.e. initiator. The context will be derived by sampling from the nodes in the cascade, taking another important piece of information into account. To be more specific, an important characteristic in the study of social influence is its temporal dynamics. In the case of diffusion cascades, the time passed between two node’s activity is known to play a role in the amount of influence the source node exerts to the target (?). Yet, previous attempts to learn influence representations did not capitalize over this attribute (?; ?). In our dataset, we can observe this phenomenon by studying the copying times in the diffusion cascades. The copying time is defined as the time passed between the initiator’s tweet and a node’s retweet. As mentioned above, we focus on the nodes that have started a cascade in the test set (initiators) and use as a measure of a node’s influence the DNI. We compute the average copying time in the train cascades of each test initiator, and average it to get an estimate of how fast the test initiators get “copied” during their train cascades. Subsequently, we group the initiators based on their DNI (in the test set) and compute the average copying time of each group to plot it opposed to the DNI in Figure 2.
This plot indicates that nodes with higher influence tend to initiate much faster cascades than ordinary influencers.
Apart from confirming our intuition, this observation buttress several findings that underline the decay of influence as the copying time increases in realworld diffusion cascades (?). Moreover, the inverse of the copying time has been employed in the past for social network embeddings (?).
To this end, we create an initiator’s context by sampling over all nodes in the cascade with probability inversely proportional to their copying time, so the faster the retweet, the more probable that node will appear in the context of the initiator.
For the current analysis, we do an oversampling of 120% to emphasize the importance of fast resharing in the depiction of influence.
This approach diminishes the aforementioned complexity of nodecontext creation to , since the context is created in linear time from the nodes in the cascade without requiring searching the underlying network. It should be noted here that our method’s final purpose is influence maximization, which is why we focus so much on the activity of influencers and overlook the rest of the nodes. If we aimed for another task, such as predicting the course of a diffusion or recommending friendships, it is unclear whether this type of sampling would be effective.
Learning with the INFECTOR Model
Since our model uses solely the diffusion cascades, we aim to utilize as much information as possible. To this end, we propose to use a multitask learning neural network (?), in order to learn simultaneously the influence relationships and the aptitude of an influencer to create long diffusion cascades. We chose to extend the typical architecture in a multitask learning setting because () the problem could naturally be broken in two tasks, and () theoretical and applied literature suggests that training linked tasks together improves learning for all of them (?; ?). In our case, given an input node , the first task is to classify the nodes that it will influence and the second to predict the size of the cascade it will create.
An overview of our proposed INFECTOR model can be seen in Figure 3. It has two types of inputs. The first is the training set comprised of the nodecontext pairs, as defined in the previous section, , where and are one hot encoded nodes, with the number of initiators in the train set and the number of nodes in the network. The second is a set of nodenumber pairs , each comprised of a node and the length of a cascade is initiated in the train set. To perform joint learning of both tasks we mix the inputs following the natural order of the data; given a cascade, we first input the initiatorcontext pairs extracted from it and then the initiatorcascade length pair, as shown in Figure 3. , with being the embeddings size, represents the source embeddings, the embeddings of cascade initiator , the target embeddings and is a constant vector initialized to 1. Note that is retrieved by the multiplication of the onehot vector of with the embedding matrix . The first output of the model represents the diffusion probability of the source node for a node in the network. It is created through a softmax function and its loss function is the crossentropy:
(Hidden)  (1)  
(Output)  (2)  
(Loss)  (3) 
The second output aims to regress the cascade length, which has undergone minmax normalization relative to the rest of the cascades in this set, and hence a sigmoid function is used to bound the output at . The loss function is the typical squared loss:
(Hidden)  (4)  
(Output)  (5)  
(Loss)  (6) 
Here is a onehot representation of the target node and is the normalized cascade length. We employ a nonlinearity here instead of a simple regression because without it, the updates induced to the hidden layer from the second output would heavily overshadow the ones from the first, and hence the influence relationships would not be captured. Furthermore, we empirically observed that the update of a simple regression would cause the gradient to explode eventually.
Training
To elaborate more on the training of the embeddings, from the chain rule we have that:
(7) 
It is straight forward to see that:
(8) 
The activation derivative for the classification task is
(9) 
where are the dimensions of the vectors and . The activation derivative for the regression task is simply .
Finally, the derivatives of the loss function are given by:
(10) 
(11) 
The training happens in an alternating manner, meaning when one output is activated the other is idle, thus only one of them can change the embeddings at a training step. More specifically given a node that starts a cascade of length 4 like in Figure 3, will be updated 4 times based on the error of , and one based on , using the same equation and same learning rate for the training step :
(12) 
The intuition here is that the embeddings of the initiators do not only contain the information of who they influence, but also their overall aptitude to create strong cascades. The embeddings undergo:

Updates of certain embedding dimensions using the upper formula from Eq. (11) to form the influence patterns with the output layer .

Updates using the lower formula from Eq. (11), to increase the overall norm of the embedding analogously to the size of the cascade the initiator creates.
The second effect is becoming clearer when is a constant, because the update from the loss changes only the embeddings. This was empirically observed as we experimented also with being a variable. We will fully utilize this property later as a core part of our influence maximization algorithm, where we aim to filter out initiators using the norm of their embeddings. This provides substantial acceleration and computational advantages. As a final note, due to the number of nodes in the networks employed for our experiments, calculating the denominator of Eq. (2) is too computationally expensive, so we employ noise contrastive estimation (?). The difference between this and previous influence representation models should be clear, as the latter produce influence probabilities between nodes that are connected in an underlying social network (?; ?) while ours produces diffusion probabilities that are independent of the network.
Influence Maximization with
Influencer Vectors
Reformulation of the Problem
We observe that the diffusion probabilities derived from Eq. (2) can substitute the computation of the influence spread, which is typically based on diffusion models, such as the independent cascade model. We start by computing the diffusion probability matrix
(13) 
which consists of the initiators in one dimension and all nodes in the other. Softmax is applied to each row of the product matrix to get the diffusion probabilities each initiator. As a first step, we have to recall from the background section that the influence spread under a specific liveedge sampling is determined by the paths formed by the sampled edges. Intuitively, in our case, the diffusion probability stands for the probability of a node appearing in a diffusion started by seed, independently of the two nodes’ distance in the network. This means that these probabilities implicitly include the underlying influence paths from the seed to the infected node, in which case we can interpret as a complete bipartite network. The left side nodes are the candidate seeds (initiators from the training set), and each of them can influence every node in the right side, where the rest of the network resides. Since the edges are all directed from left to right, all the paths with length more than 1 are removed. This means that we do not require a diffusion model to estimate the spread anymore. Since there are no higher order influence paths, each seed can influence a node only through their direct edge.
For example, in the traditional setting, a node might be able to influence another node by influencing node between them, as shown in Figure 4. However, in our case, if could indeed induce the infection of in a direct or indirect manner, it would be depicted by the diffusion probability . Thus we can remove all influence paths with length more than 1. From a datadriven perspective, this approach captures the case when appears in the diffusions of and appears in the diffusions of but not in ’s. This might happen because node reshares different types of content, and when this content comes from it diffuses in different directions than . Hence, it would be wrong to assume that ’s infection would be able to eventually cause ’s. Typical influence maximization algorithms that rely on diffusion models fail to capture this effect of higher order correlations. A diffusion model acts in a Markovian manner and can spread the infection from to . It is obvious that apart from capturing this effect, the proposed formulation allows us to overlook the complexity induced by the diffusion models in influence maximization and hence surpass the computational bottleneck.
Assuming the diffusion probabilities are independent, the probability of a node getting influenced by a seed set is the complementary probability of not getting influenced by each node . Summing this over all nonseed nodes can give a new influence spread:
(14) 
To transform this into a set function that computes the infected nodes, we can use a threshold, meaning that a node will get infected if its probability of getting influenced by the seed set at this step is equal to or more than 0.5, which is the value used in classifiers with softmax output. Unfortunately, it is easy to see that this function is not submodular. Think of a toy example with two source nodes that can influence three other nodes with probabilities and , respectively. In the first step, the algorithm will choose as it gives 2 infected nodes opposing to that gives 0. In the second step, following our definition, the addition of will infect the second and final node, thus the property of diminishing returns does not stand for this influence spread and we can not utilize the greedy algorithm.
The IMINFECTOR Algorithm
As we saw above, directly minimizing the probability of a target node getting influenced by the seed set is not an option. Instead, if we view the diffusion probabilities as edge weights – a simplification used extensively in similar context (?) – we can use the bipartite network to transform the problem into weighted bipartite matching. The main difference with our case is that each initiator (left side) can be assigned to more than one nodes (right side). This can be alleviated by creating a number of clone seeds for each seed, such that the clones can be assigned to the nodes the initial seed would be assigned to, in a onetoone fashion. The number of clones for each seed represents the number of nodes the seed can influence. This transforms the problem into the traditional perfect matching. To do this, we can define an expectation of a candidate’s seed influence spread based on the norm of its embeddings. Recall that, the embeddings are trained such that captures their potential to create lengthy cascades. To this end, we define the fraction of all nodes that an initiator is expected to influence as:
(15) 
where is the set of initiators and the second term resembles the norm of relative to the norm of the rest initiators. Cloning each node for times transforms into a balanced bipartite graph. To find the optimum matching, in this case, we could use the Munkres assignment algorithm (?). Unfortunately, it has a complexity of which renders it prohibitive for the networks that we are trying to address. Moreover, constructing a nonsparse matrix in this scale is also not computationally feasible. Thus we will rely on a greedy heuristic method instead.
The first thing we have to do is reduce the number of candidate seeds in order for to fit in memory. To do this, we rank all initiators based on the norm of their source embeddings and keep the top . Afterwards, we have to define the assignment mechanism between a seed and the nodes it influences, which in influence maximization terms represents the influence spread. In an analogy to the bipartite matching, we assign a seed to the node that connects to it with the edge that has the maximum weight from all of the seeds edges. Subsequently, the first clone is assigned to the node with the second maximum edge, the second clone to the third edge, etc. The number of assignments amounts to the number of the seed’s clones, which is . Thus, for a seed its influence spread is the total edge weight of the nodes it influences and is given by
(16) 
where is the edge weights of seed sorted in descending order. Having defined this, it is straight forward to devise a greedy heuristic where we add in the seed set the node with the maximum . Once a seed is added to the seed set, the nodes assigned to it can not be reassigned and hence they are removed from . This means that the influence spread of a seed will never get bigger in two subsequent rounds of the greedy iterations, and thus we can employ the CELF trick (?) to accelerate our solution. The algorithm is given at Algorithm 1 and is a straightforward adaptation of CELF (?) using the aforementioned influence spread. We keep a queue with the candidate seeds and their characteristics (line 2) and the nodes that have not been assigned to a seed yet (line 3). The characteristics of a candidate seed is the nodes it influences (line 5) and its influence spread (line 6), as defined by eq. (16). Then we sort based on the influence spread (line 8) and proceed to include in the seed set a candidate seed that has the maximum influence spread. Once a seed is chosen, the nodes it influences can not be reinfected (line 14). This ensures that the size of a candidate seed’s influence set diminishes through the iterations (line 17) and computes the marginal gain.
Experiments
Datasets
We tried to assemble a set of sizable networks that can be used for influence maximization and are accompanied by a form of information diffusion so that we can define the cascades. More specifically, we employ two social networks that have been heavily used in relevant studies and one bibliographical network that we extracted specifically for this task and we deem suitable for databased influence maximization. Table 1 gives an overview of the datasets used in our study.
Sina Weibo  Digg  MAG  

Nodes  1,170,689  279,631  1,436,158 
Edges  225,877,808  2,251,166  20,456,480 
Cascades  115,686  3,553  181,020 

Sina Weibo: A directed follower network where a cascade is defined by the first tweet and the list of retweets (?). We remove from the network nodes that do not appear in the cascades, in order to make more fair the comparison between structural and diffusionbased methods, since the evaluation relies on diffusions.

Digg: A directed friendship network derived by a social media website where a vote to a story is treated as a retweet and the first voter is regarded as the initiator of the cascade (?).

MAG Computer Science: We follow suit from (?) and define a network of authors with undirected coauthorship edges, where a cascade happens when an author writes a paper and other authors cite it. In other words, a coauthorship is perceived as a friendship, a paper as a post and a citation as a repost. In this case, we employ Microsoft Academic Graph (?) and filter to keep only papers that belong to computer science. We remove cascades with length less than 10.
Baseline Methods
To facilitate a thorough comparison, we employ metrics and algorithms that represent different approaches to the problem of influence maximization. In general juxtaposing the effectiveness of structural and diffusion metrics with influence maximization algorithms is something missing from the literature, to the best of our knowledge, especially in the scale of the networks utilized.

Kcores: The top nodes in terms of their core number, as defined by the undirected core decomposition. This metric is extensively used for influencer identification and it is considered the most effective structural metric for this task (?).

AVG Cascade Size: The top nodes based on the average size of their cascades in the train set. This is a straightforward ranking of the nodes that have proven effective in the past (?).

IMM: influence maximization algorithm that operates on the network and is considered stateoftheart (?). The edge weights are set in accordance with the weighted cascade. Note here that, most traditional influence maximization algorithms, such as CELF, do not scale to the networks we have used for evaluation, which is the reason we did not use them.

IMINFECTOR: Our proposed model, with embedding size equal to 50, trained for 5 epochs with a learning rate of 0.1. The reduction percentage is set to 10 for Weibo and MAG (due to the computational demand) and 40 for Digg. There was no hyperparameter tuning of the neural network, which means there is room for significant improvement given enough time.
Evaluation Methodology
As mentioned above, we split each dataset into train and test cascades based on their time of occurrence. Each influence maximization approach will use the information on the train cascades or the underlying network to define a seed set. The train cascades amount for the first 80% of the whole set and the rest is left for testing. The test cascades are used for a twofold evaluation. The first is to compute how many of the predicted seeds have indeed started a cascade in the test set. This can be considered an adaptation of precision at metric utilized in recommender systems, for different seed set sizes . In the second metric, we evaluate the quality of the predicted seed set using the aforementioned DNI. We consider as influenced every node that participates in a test set diffusion initiated from one of the predicted seeds.
The main advantage of this evaluation is that measuring the size of the distinct set, potential overlaps between diffusions of different seeds are taken into account. Although not devoid of assumptions, it is the most objective measure of a seed set’s quality (?).
Since our datasets differ significantly in terms of size, we have to use different seed set sizes for each one. For MAG which has 205,839 initiators in the train set, we test it on 10,000, Weibo with 24,748 is tested on 1,000, and Digg with 537 has a seed set size of 100.
This modification is crucial to the objective evaluation of the methods, because small seed sets favor simple methods.
For example, taking the top 100 nodes in terms of connectivity in a dataset like MAG would unavoidably work well because these authors are immensely successful. However, increasing the seed set size allows for the real effect of influence overlapping to take place and eventually simplistic methods fall short to the ones that take more information into account.
The experiments were run on a machine with Intel(R) Xeon(R) W2145 CPU @ 3.70GHz, 252GB ram and an Nvidia Titan V.
Results
Figures 5 and 6 show the precision and the quality of the predicted seed sets. The metrics are outperformed in the majority of the tasks, with the exception of Kcore that surpasses IMM at Digg. Comparing IMM and IMINFECTOR, the latter is clearly superior in Digg, while it performs better in terms of precision at Weibo but worse in terms of quality. This happens because IMINFECTOR finds indeed influencers (precision) but the heuristic approach does not arrive in a sufficient solution in this case, as the seeds overlap with each other. In both cases, however, IMINFECTOR follows a much steeper trend, exhibiting the potential to surpass IMM if the seed size increases. This effect is more prevalent in MAG, where both methods go initially toe to toe, with IMINFECTOR eventually prevailing in both, precision and quality. This tendency might reveal that IMINFECTOR does indeed capture influence overlapping, but it is not visible in these seeds due to the size of the dataset. This adheres to the effect we mentioned above regarding the role that the size of the network plays in the influence overlap of the seed set. This also depends heavily on the type of the network.
Since the results on MAG are ambiguous, we performed an extra qualitative analysis by locating the authors proposed by both approaches and computing their hindex. The average hindex for the IMINFECTOR’s seeds is 29.25 while for IMM is 23.22. We provide a top 10 list for both cases in Table 2.
In terms of computation, IMINFECTOR performs all steps (preprocessing, training and algorithm) in 489, 5,671 and 9,042 for Digg, Weibo and MAG respectively, while IMM with preprocessing takes 15, 1,124 and 500 seconds. While IMM seems more efficient, we can not really compare them sufficiently in this aspect as they rely on different types of input. Moreover IMINFECTOR is implemented in python while IMM in c++. As an indication of the burden of propagation network, a python implementation of inf2vec (?) takes almost 40 times the preprocessing time of INFECTOR on Weibo, which makes it prohibitive for MAG.
IMM  IMINFECTOR 

Peer Bork, 166  Eric Lander, 280 
Alan Evans, 155  Karl Friston, 186 
Ruedi Aebersold, 151  Francis Collins, 168 
Anil Jain, 148  Peer Bork, 166 
Donald Schneider, 143  Alan Evans, 155 
David Holmes, 139  Todd Golub, 153 
Jiawei Han, 131  Ruedi Aebersold, 151 
Tony Pawson, 127  Anil K. Jain, 148 
Andrew Zisserman, 123  Patrick Brown, 147 
Philip S. Yu, 120  Donald Schneider, 143 
Conclusions
In this paper, we have proposed IMINFECTOR, a method to perform influence maximization using representations learned from diffusion cascades. The algorithm outperformed several methods based on a datadriven evaluation in three large scale datasets. In this study, we employ purely diffusion or networkbased methods and leave the hybrid approaches (?; ?; ?) as a next step, mainly due to their computational demand.
The proposed algorithm can be improved in multiple ways. The most obvious next step is hyperparameter tuning or the addition of another hidden layer. The second is to employ Munkres’s algorithm (?) to identify the optimum solution to the problem, given sufficient resources. Overall, the main purpose of this work is primarily to examine the application of neural networks in the problem of influence maximization and secondarily to highlight the importance of datadriven evaluation – comparing with methods from different domains. We hope this will pave the way for more studies to approach influence maximization with machine learning means.
Acknowledgements.
The Nvidia Titan V used for this research was generously donated by the NVIDIA Corporation.
References
 [Aral and Dhillon 2018] Aral, S., and Dhillon, P. S. 2018. Social influence maximization under empirical influence models. Nature Human Behaviour 2(6):375.
 [Bakshy et al. 2011] Bakshy, E.; Hofman, J. M.; Mason, W. A.; and Watts, D. J. 2011. Everyone’s an influencer: quantifying influence on twitter. In International conference on Web Search and Data Mining (WSDM), 65–74.
 [Bourigault, Lamprier, and Gallinari 2016] Bourigault, S.; Lamprier, S.; and Gallinari, P. 2016. Representation learning for information diffusion through social networks: an embedded cascade model. In International Conference on Web Search and Data Mining (WSDM), 573–582.
 [Budak, Agrawal, and El Abbadi 2011] Budak, C.; Agrawal, D.; and El Abbadi, A. 2011. Limiting the spread of misinformation in social networks. In International Conference on World Wide Web (The WebConf), 665–674.
 [Caruana 1997] Caruana, R. 1997. Multitask learning. Machine Learning 28(1):41–75.
 [Chen, Wang, and Wang 2010] Chen, W.; Wang, C.; and Wang, Y. 2010. Scalable influence maximization for prevalent viral marketing in largescale social networks. In International Conference on Knowledge Discovery and Data Mining (KDD), 1029–1038.
 [Cohen et al. 2014] Cohen, E.; Delling, D.; Pajor, T.; and Werneck, R. F. 2014. Sketchbased influence maximization and computation: Scaling up with guarantees. In International Conference on Conference on Information and Knowledge Management (CIKM), 629–638.
 [Du et al. 2013] Du, N.; Song, L.; Rodriguez, M. G.; and Zha, H. 2013. Scalable influence estimation in continuoustime diffusion networks. In Advances in Neural Information Processing Systems (NeurIPS), 3147–3155.
 [Evgeniou and Pontil 2004] Evgeniou, T., and Pontil, M. 2004. Regularized multi–task learning. In International Conference on Knowledge Discovery and Data Mining (KDD), 109–117.
 [Feng et al. 2018] Feng, S.; Cong, G.; Khan, A.; Li, X.; Liu, Y.; and Chee, Y. M. 2018. Inf2vec: Latent representation model for social influence embedding. In International Conference on Data Engineering (ICDE), 941–952.
 [Garimella et al. 2017] Garimella, K.; Gionis, A.; Parotsidis, N.; and Tatti, N. 2017. Balancing information exposure in social networks. In Advances in Neural Information Processing Systems (NeurIPS), 4663–4671.
 [Goyal, Bonchi, and Lakshmanan 2010] Goyal, A.; Bonchi, F.; and Lakshmanan, L. V. 2010. Learning influence probabilities in social networks. In International Conference on Web Search and Data Mining (WSDM), 241–250.
 [Goyal, Bonchi, and Lakshmanan 2011] Goyal, A.; Bonchi, F.; and Lakshmanan, L. V. 2011. A databased approach to social influence maximization. Proceedings of the Very Large Data Bases Endowment (VLDB) 5(1):73–84.
 [Grover and Leskovec 2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In International Conference on Knowledge Discovery and Data Mining (KDD), 855–864.
 [Gutmann and Hyvärinen 2010] Gutmann, M., and Hyvärinen, A. 2010. Noisecontrastive estimation: A new estimation principle for unnormalized statistical models. In International Conference on Artificial Intelligence and Statistics (AISTATS), 297–304.
 [Islam et al. 2018] Islam, M. R.; Muthiah, S.; Adhikari, B.; Prakash, B. A.; and Ramakrishnan, N. 2018. Deepdiffuse: Predicting the ’who’ and ’when’ in cascades. In International Conference on Data Mining (ICDM), 1055–1060.
 [Kalimeris et al. 2018] Kalimeris, D.; Singer, Y.; Subbian, K.; and Weinsberg, U. 2018. Learning diffusion using hyperparameters. In International Conference on Machine Learning (ICML), 2425–2433.
 [Kempe, Kleinberg, and Tardos 2003] Kempe, D.; Kleinberg, J.; and Tardos, É. 2003. Maximizing the spread of influence through a social network. In International Conference on Knowledge Discovery and Data Mining (KDD), 137–146.
 [Lerman, Ghosh, and Surachawala 2012] Lerman, K.; Ghosh, R.; and Surachawala, T. 2012. Social contagion: An empirical study of information spread on digg and twitter follower graphs. arXiv preprint arXiv:1202.3162.
 [Leskovec et al. 2007] Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; and Glance, N. 2007. Costeffective outbreak detection in networks. In International Conference on Knowledge Discovery and Data Mining (KDD), 420–429.
 [Li et al. 2017] Li, C.; Ma, J.; Guo, X.; and Mei, Q. 2017. DeepCas: An endtoend predictor of information cascades. In International Conference on World Wide Web (The WebConf), 577–586.
 [Malliaros, Rossi, and Vazirgiannis 2016] Malliaros, F. D.; Rossi, M.E. G.; and Vazirgiannis, M. 2016. Locating influential nodes in complex networks. Nature Scientific reports 6:19307.
 [Munkres 1957] Munkres, J. 1957. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5(1):32–38.
 [Panagopoulos, Malliaros, and Vazirgiannis 2018] Panagopoulos, G.; Malliaros, F. D.; and Vazirgiannis, M. 2018. Diffugreedy: An influence maximization algorithm based on diffusion cascades. In International Conference on Complex Networks and their Applications, 392–404.
 [Panagopoulos 2017] Panagopoulos, G. 2017. Multitask learning for commercial brain computer interfaces. In IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), 86–93.
 [Pei, Morone, and Makse 2018] Pei, S.; Morone, F.; and Makse, H. A. 2018. Theories for influencer identification in complex networks. In Complex Spreading Phenomena in Social Systems. Springer. 125–148.
 [Qiu et al. 2018] Qiu, J.; Tang, J.; Ma, H.; Dong, Y.; Wang, K.; and Tang, J. 2018. DeepInf: Modeling influence locality in large social networks. In International Conference on Knowledge Discovery and Data Mining (KDD), 2110–2119. ACM.
 [Saito et al. 2009] Saito, K.; Kimura, M.; Ohara, K.; and Motoda, H. 2009. Learning continuoustime information diffusion model for social behavioral data analysis. In Asian Conference on Machine Learning (ACML), 322–337.
 [Sinha et al. 2015] Sinha, A.; Shen, Z.; Song, Y.; Ma, H.; Eide, D.; Hsu, B.j. P.; and Wang, K. 2015. An overview of microsoft academic service (mas) and applications. In International Conference on World Wide Web (The WebConf), 243–246.
 [Tang, Shi, and Xiao 2015] Tang, Y.; Shi, Y.; and Xiao, X. 2015. Influence maximization in nearlinear time: A martingale approach. In International Conference on Management of Data (SIGMOD), 1539–1554.
 [Wang et al. 2017a] Wang, J.; Zheng, V. W.; Liu, Z.; and Chang, K. C.C. 2017a. Topological recurrent neural network for diffusion prediction. In 2017 IEEE International Conference on Data Mining (ICDM), 475–484.
 [Wang et al. 2017b] Wang, Y.; Shen, H.; Liu, S.; Gao, J.; and Cheng, X. 2017b. Cascade dynamics modeling with attentionbased recurrent neural network. In International Joint Conference on Artificial Intelligence (IJCAI), 2985–2991.
 [Zhang et al. 2013] Zhang, J.; Liu, B.; Tang, J.; Chen, T.; and Li, J. 2013. Social influence locality for modeling retweeting behaviors. In International Joint Conference on Artificial Intelligence (IJCAI), volume 13, 2761–2767.
 [Zhang, Lyu, and Zhang 2018] Zhang, Y.; Lyu, T.; and Zhang, Y. 2018. Cosine: Communitypreserving social network embedding from information diffusion cascades. In AAAI Conference on Artificial Intelligence, 2620–2627.