How to Maximize the Spread of Social Influence: A Survey
Abstract
This survey presents the main results achieved for the influence maximization problem in social networks. This problem is well studied in the literature and, thanks to its recent applications, some of which currently deployed on the field, it is receiving more and more attention in the scientific community. The problem can be formulated as follows: given a graph, with each node having a certain probability of influencing its neighbors, select a subset of vertices so that the number of nodes in the network that are influenced is maximized. Starting from this model, we introduce the main theoretical developments and computational results that have been achieved, taking into account different diffusion models describing how the information spreads throughout the network, various ways in which the sources of information could be placed, and how to tackle the problem in the presence of uncertainties affecting the network. Finally, we present one of the main application that has been developed and deployed exploiting tools and techniques previously discussed.
1 Introduction
Influencing people is a very ancient issue. All populations in all eras dealt with it. Going back to the Ancient Greece, there were the Sophists, philosophers and teachers, but also incredible speakers. They were so well trained they could take any position and defend any hypothesis. The Latins had the Oratori, like Cicero, involved in politics and very able to move the sentiment and influence people and governors to accept or fight for laws and rights. Then, in the Medieval Age, preachers were so inspiring they could move entire masses of people to fight in foreign lands, as happened during the Crusades. Nowadays, the problem of influencing people has still a political aspect, but it has also acquired a very important commercial feature, e.g., companies investing lots of money in advertising to sell more products and become more and more popular and wealthy. From a scientific (and computer science) point of view, the Influence Maximization Problem (IMP) has been formalized for the first time in 2003 [30].
1.1 From Mass to Direct to Viral Market
Marketing has always been a fertile soil for applications of theories coming from Knowledge Discovery and Data mining (KDD) [20]. Instead of focusing on mass marketing, where products are proposed to all the possible consumers without taking into account any further information, like her preferences, KDD studied problems related to direct marketing, i.e., among all the possible potential customers, first select the ones that seems more willing to buy the product that is being promoted [36]. Behavioral models have been built to predict the behavior of the customers, e.g., if they will buy some product while not purchasing another one, exploiting information about the client herself and her previous purchases [19, 47]. However, such an approach has a strong drawback, namely, assuming someone purchases a product only taking into account her own preferences, while people are always influenced by others’ opinions, tastes and behaviors. This kind of marketing, based on wordofmouth, is called viral because it spreads like an epidemic, from person to person [18, 53]. Viral marketing has two remarkable features. First, it is very costeffective, since the customers propagate the benefits of a product without the direct intervention of a company. Then, because of of information diffuses according to it, it does not require to build a profile for each customer, but the problem can be reformulated as figuring out who are the most influential customers, i.e., people that other people will listen to, and will be driven by in performing their purchases.
While nowadays it may seem obvious, such an approach has been a breakthrough since ignoring these effects and just relying on the preferences of the single user clearly leads to suboptimal solutions. Thus, from a practical point of view, the intrinsic value of a customer is not the only one that defines her: even though, if considered alone, her value could be smaller than the one of another customer, including the potential value of her network and the possibility of spreading the word and influencing others, may significantly increase her actual value for some marketing campaign.
Unfortunately, quantifying the value of the neighbors of a customer is a hard task since it does not depend only on the person but, potentially, on the entire network. This is why all the studies in this research field aim at finding interesting individuals in social networks by exploring and exploiting both the relationships each person has and the topology of the network.
1.2 Structure of the Paper
In this work, we focus on the Influence Maximization Problem in social networks. The survey about models and algorithms for social influence analysis briefly discusses IMP [54, Section 3], presenting some diffusion models and few applications. Our paper focuses entirely on IMP, exploring the most important diffusion models that have been proposed, the algorithms developed to solve the problem faster and faster, and the refinements introduced to design models closer and closer to reality. The rest of the paper is organized as follows.

Section 2 presents the problem of influence maximization in social networks. First, in Section 2.1, we formally present the basic version of the problem of influence maximization. Then, we focus on different extensions that have been developed. Specifically, Section 2.2 reports the main diffusion models that have been proposed, Section 2.3 analyzes different ways of placing the diffusion seeds, and finally, in Section 2.4, we present the main uncertainties that may affect the structure of the network.

Section 3 provides the most important results achieved for the different scenarios presented in the previous section.

Section 4 presents one of the current deployed most interesting applications developed by implementing methods and techniques presented in the paper. Specifically, the problem is preventing the diffusion of HIV in youth by identifying peer leaders in communities so that, after being trained, they could influence the peers and spread these positive information on how to prevent the contraction of such a disease.

Finally, Section 5 concludes the work and proposes some future directions that could be undertaken.

Appendix A reports the main symbols adopted throughout the work.
2 Maximizing the Spread of Social Influence
We introduce the problem of influence maximization as it was formulated for the first time in 2003 in [30, Section 2.1]. In the rest of the section, we present the main contributions that have been designed building up on this model, grouping them according to the direction they developed.
2.1 Influence Maximization Problem
Here, we introduce the main elements characterizing a social network and the problem of maximizing the spread of the information.
The goal of the influence maximization problem is to maximize the number of individuals that are reached by some information. For example, we can consider a social network, where the different users are connected by friendship relationships. Here, one may be interested to study how the adoption of some product is influenced by the fact that some friends have already adopted it. Another fundamental element is the way in which the influence diffuse in the network, moving from one node to another one. These ways are called diffusion models, and we will present several of them that have been developed to be more and more realistic [30, 37, 62].
People in the network are usually divided into two main categories: active users, who have been reached by the information, and inactive users, who are not aware of the information yet, but there is a positive probability they may be influenced. Active users can influence inactive users, changing their status and activating them. One possible way for a node to be activated is because one of her neighbors has been activated and thus, in the next time window, it will try to influence her. Once a node has been influenced, it will remain active until the end of the process and, unless differently specified, each node tries to influence its neighbors only in the time instant after its activation.
Formally, the social network is modeled as a directed graph , where each node represent an individual in the network and each edge represents the connections between agents. We denote and .
Each edge is characterized by some probability: ,
The decision version of the problem can be written as follows.
Definition 1 (Influence Maximization Problem (IMP)).
The decision version of Influence Maximization Problem is defined as follows.

INSTANCE: a graph , influence probability for each , a diffusion model , an integer .

QUESTION: is there a subset , with , such that, placing the seeds in , according to , all the nodes in the network are influenced?
Next, we present the main features that have been developed to enhance the basic model just described. More precisely, Section 2.2 introduces the different diffusion models that have been adopted, starting from the most common ones, e.g., Independent Cascade model, generalizing them, and introducing more realistic ones. Then, in Section 2.3, we discuss the main methods that have been proposed to place the different seeds in the network. Finally, Section 2.4 presents variations that have been proposed on the structure of the graph itself, e.g., how to redefine the model if the network itself is unknown.
2.2 Diffusion Models
A fundamental component of IMP is the diffusion model, i.e., the way in which information diffuses throughout the network. In this section, we introduce the most adopted ones, presented in [5, 30, 37, 62].
Cascade Models
The influence process of the Independent Cascade (IC) diffusion model works according to a discrete representation of the passing of time. Let us suppose that the set of seeds has been defined. At time instant , nodes in are activated. Then, at , each node may activate its neighbors with probability , which depends on the strength of their connection. In general, at time instant there will be nodes active. At , each node in will have the chance to influence its neighbors with probability equal to the weight of the edge connecting them, thus independent by the activation history. If succeeds, then its neighbor is added to the set of active nodes . However, if should fail, no other attempts at influencing its neighbors are possible. The diffusion process ends when no new nodes are added to , i.e., no other activations are possible. Notice that, because of the single opportunity each node has to influence its neighbors and the fact that such an influence depends on , the process may end even though not all the nodes in the network may have been influenced.
This model can be generalized: we allow the probability that succeeds in activating one of its neighbors to depend also on the neighbors of that have already tried spreading the influence. Formally, we can define an incremental function , with being a subset of ’ neighbors and . The diffusion process is the following. When tries to activate , it succeeds with probability , where is the set of neighbors that have already tried and failed to activate . Of course, we consider only cascade models defined by incremental functions that are orderindependent, i.e., if try to activate , then the probability that is activated is independent by the order with which such attempts have been performed by . This model is known as Generalized Cascade model.
Given the intractability of such models in most of the situations (as we will see in Section 3), we can add a condition to the cascade models to make them tractable, defining the Decreasing Cascade model, in which the probability that some seed influence some vertex is nonincreasing as a function of the sets of nodes that have previously tried to influence . Formally, this means that , for every . This is known as the diminishing influence condition.
There is a further extension of IC, namely ICN, i.e., Independent Cascade with Negative states, which has been introduced for the first time in [5]. Here, the nodes may have three states: positive, neutral and negative. At the beginning, each node is neutral. As for IC, a node can be activated at any turn , but in this case can be influenced either positively or negatively. Thus, at , each seed is positively activated with some probability and negatively with probability , independently on the others. At , for each neutral node , a permutation of all its neighbors activated at time is computed. Then, according to such permutation, the nodes try to positively or negatively influence according to their status, with probability , i.e., the influence that has on . As for IC, the process ends when a fixed point is reached, i.e., there are no more activation in the current time .
Threshold Models
According to the Linear Threshold (LT) diffusion model, each vertex has a threshold representing the total weight its neighbors should achieve in order for to be activated. The diffusion process develops in a discretized deterministic fashion. At the beginning, the seeds are the only active nodes. Then, at step , all the inactive nodes for which the following condition hold:
(1) 
are activated and added to , with being neighbors of . Active nodes remain active. The process stops when no more nodes can be activated. From a qualitative point of view, the threshold may represent the tendencies of the different people to be convinced of something, e.g., adopting a particular thought about a political issue or embracing a new technology.
This linear model can be generalized, extending the assumption that each individual gathers influence only linearly. Specifically, we could associate to each node an arbitrary monotonic function of the set of its already active neighbors. Consequently, each node has a monotonic threshold function, , such that, when , where are the neighbors of . If this condition holds, then results activated and it is added to . The diffusion process is the same as before. This model is known as Generalized Threshold model.
Triggering Model
In the triggering diffusion model, each node independently chooses a random triggering set according to some distribution over subsets of its incoming neighbors. At the beginning, a set of seeds is activated. Then, an inactive node becomes active in step if it has a neighbor in its chosen triggering set that is active in step .
Compared to the Threshold model, ’s threshold has been replaced by a latent subset of neighbors whose behavior actually affects . A nice way of looking at the model is by distinguishing the edges between live and blocked, depending on whether they belong to the triggering set or not. Specifically, if , i.e., the triggering set of node , then the edge is dubbed live, otherwise it is blocked. Since our goal is maximizing the number of active nodes at the end of the diffusion process, then we are saying that , i.e., the set of active nodes at time , is the set of nodes such that is reachable from via a path consisting entirely of live edges.
Heat Diffusion Model
This interesting diffusion model has been proposed for the first time in [37]. The rationale behind is exploiting the physical phenomenon of how the heat diffuses from a point with a higher temperature to a point with a lower temperature.
The same principle can be applied to social influence and the spreading of, for example, innovations: in this scenarios, the innovators are the heat sources and have a high temperature, so that their ideas diffuse to other people throughout the network, as heat diffuses form point to point in some medium.
Formally, the Heat Diffusion Model (HDM) can be described as follows.
Let be the activation threshold of node , and let us denote with the heat at node at time instant . Let us assume that at time , the heat distribution starts, and, at , each node receives some amount of heat from each of its neighbor during the period . It is reasonable to assume that the heat received by will be proportional to the heat difference between and and time . Thus, we can formulate this as:
where is the heat diffusion coefficient and is defined as:
In the above formula, is the degree of node .
Solving this equation, we finally get:
where is called diffusion kernel since the heat diffusion process continues infinitely from the initial diffusion at . If the amount of heat received by is at least , then the node is activated.
Dynamic Diffusion Model
This model has been proposed to enrich the diffusion model by making it dynamic [62]. The novelty here is that they adopt the exponential distribution to model the propagation rate since different edges may have different ones, and they may change over time. Formally, for the Dynamic Diffusion model (DynaDiffuse), let us consider two adjacent nodes, say . Then, the influence diffusion probability is , where is the timespan after has been influenced and is the propagation rate. Differently from the other models, here each node tries repeatedly to influence its neighbors. With these elements, we can infer the diffusion network, and a Continuous Time Markov Chain can be built, provided an initial set of vertices .
Definition 2 (Continuous Time Markov Chain (CTMC)).
A CTMC is a tuple , where is a set of states and the initial one. is a transition rate matrix assigning rates to pairs of states, with used as the rate parameter for the exponential distribution , for . For a fixed finite set of action labels, assigns such labels to every transition with .
To represent the diffusion according to DynaDiffuse, we build a CTMC. Specifically, each state has dimensions and an associated boolean value saying whether or not the node has been activated. The construction is as follows.

Create an initial state for , the initial set.

Then, iteratively create all other reachable states in and the corresponding transitions in . If some states are already in , just add the transitions.

Repeat the previous point until there is nothing more to add.

For a transition from to , the transition state is computed as , where is the propagation rate of each edge that can influence the new node.
The last step is introducing additional variables to model the dynamic properties of the networks. Specifically, local variables are introduced to model the degree of effects on the diffusion network by a positive function . In a CTMC, transitions divided into two groups: internal, which define how these dynamic factors evolve stochastically, and external, which are associated with action labels that will be used for synchronization among CTMCs.
CTMC evolves in time according to action labels. Moreover, transitions with different labels in different CTMC may occur independently.
With this formalism, we can model the most common dynamic characteristics. Thus, we get a labeled CTMC for an inferred diffusion network and a series of CTMCs for the dynamics. Finally, we denote with the parallel composition of two labeled CTMCs.
Definition 3 (Dynamic Diffusion (DynaDiffuse) model).
Given a CTMC for a network , a set of propagation rates for each , an initial set of vertices and a set of dynamic characteristics , where each is a CTMC model whose transition matrix represents a positive function , DynaDiffuse is a synchronous CTMC model defined as .
2.3 Seeds Placement
In this section, we introduce the main models that have been proposed to extend the possible ways of placing the diffusion seeds in the network. Specifically, we first present a model in which only jump and crawl operations are allowed, and then we will deal with the adaptive seeding problem.
Jumping and Crawling
Considering the ways in which individuals can be reached by other people on social networks, we analyze scenarios where we are allowed to perform only two operations, which are the main ones that may be undertaken in social networks [3]. The first action is crawling: if a user has been discovered or added to our friends, we are usually provided also the links to her neighbors, e.g., friends/followers. The second action we can perform is jumping: in general, we are given the possibility of searching some other node which may be far from the discovered ones, e.g., by means of a search bar allowing us to find new users. While in realworld applications such a search is not random, i.e., there are different probabilities for us searching new users, for simplicity we assume that, once we execute the jump, each node has the same probability of being selected. In such settings, the goal is still finding the most influential nodes of the network, however such a search is performed w.r.t. two main parameters. First, the degree of the nodes is considered, with the aim of finding nodes with the highest degree. Formally, let be the maximum degree of the different nodes. Then, given , we want to find a vertex such that: .
Beside considering the maximum degree , the authors also investigated the clustering coefficient, aiming at finding nodes with the highest one.
Definition 4 (Clustering coefficient).
Let be a vertex with degree of a graph . Then, the Clustering Coefficient (CC) of is:
where is the number of cliques of dimension of .
Informally, given some vertex, CC measures how densely connected its neighbors are, i.e., their tendency to form a clique.
Finding the node with the highest CC may not be very informative since it may happen that the vertex with the highest CC has only few neighbors, thus not being interesting to find potential influential nodes in the network. This is why we look for nodes that have both a high CC and degree. Specifically, given some degree lower bound , we want to find a vertex of degree not smaller than whose CC approximates the maximum CC among all vertices of degree or bigger.
Definition 5 (Approximation of CC).
Given a graph with vertices and some degree value , let be the vertex with the highest CC among vertices of degree or more. Then, is an approximation to the maximum CC if and , with , .
Adaptive Seeding
Let us consider another interesting way of placing seeds, based on the socalled friendship paradox, first discovered by [21].
Proposition 1.
In any network, the expected degree of a node is bounded from above by the expected degree of a neighbor.
This result is saying that while a great majority of nodes will be ineffective in terms of maximizing the diffusion of the influence, selecting their neighbors could be more convenient. Thus, instead of investing the entire budget to immediately select possible seeds, it would be better to adopt a twostage seeding approach [50]. First, some budget is used to select a starting set of seeds , making their neighbors accessible. Then, the remaining budget is used to select a set of other seeds from such a larger pool. Of course, the best policy is the one that, while minimizing the first chosen nodes, also makes the other most influential nodes of the network accessible. Observe that despite the the first stage being an actual problem of influence maximization, the second stage is not, since models a wordofmouth process that happens without any incentive.
Before proceeding to the next section, we want to point out a final remark about the multistage nature of this approach. Given such a twostage approach, one could wonder whether a multistage approach could be beneficial, e.g., performing these selection in three or more stages. However, the authors empirically show that increasing the number of stages provides only a very marginal contribution to the quality of the solution w.r.t. the time required for computing such a solution. Moreover, they conjecture that optimizing a multistage process is computationally hard.
2.4 Network Uncertainties
In this section, we present scenarios involving uncertainties of the social network. Specifically, first we will focus on maximizing the spread of social influence in unknown graphs [39]. Then, we will study IMP when there is uncertainty on the probability function . On one side, we will present a robust approach [6], on the other, we will discuss how to deal with this problem in an online fashion data are available only as the diffusion process develops, adopting a multiarmed bandit approach [58].
Unknown Network
The novel feature we present here is tackling IMP without knowing the structure of the social network [39]. Most of existing algorithms assume that the entire topology of the network is given and so the goals are only scalability for the exact approaches and the quality of the solution for the approximated ones. However, in these settings an algorithm to compute the best placements for the seeds should also explore the network in an intelligent way in order to reconstruct the presence of the various edges.
We are given a graph modeling the network, but is unknown, and only a small number of probes is allowed to obtain hints about the topology. There are rounds and in each round nodes can be probed, i.e., when node is probed, it will return the list of its neighbors. Then, as is customary, nodes are selected as seeds and, at the end of the influence process, the number of active nodes is computed. The goal is to maximize the expected number of influenced nodes at the th turn, considering that, at each round, both the probing and the influence spreading results of the previous rounds are available. Such an assumption is relaxed in [60].
Uncertainty on the Influence
Another type of uncertainty that has been studied concerns the weights of the edges, i.e., the influence power that individuals have on each other. In other words, we know some edges exist, but we do not know how intense the influence spreads. We present two main approaches to two slightly different problems. First, we present the online version of such a problem[58], pointing out the new features w.r.t. the previous models and which techniques should be employed to solve it. Then, we discuss a robust approach to the problem [6], with the goal of maximizing the worstcase ratio between the expected number of influenced nodes reached by our choice for the placement of the seeds w.r.t. the optimal choice.
Online Influence Maximization Up to now, we considered models that have complete information on the structure of network. However, in reality, it is very difficult, if not impossible, being able to access this kind of information or being accurate in quantifying it. To avoid this assumption, in [58], the authors adopt a novel approach based on a combinatorial multiarmed bandit paradigm [9], estimating the probabilities on the edges round by round, selecting a different seed set each time. The goal is to minimize the accumulated regret incurred by choosing suboptimal seed sets over multiple rounds. The authors consider two possible feedbacks: edgelevel feedback, which allows to observe how the influence spread among the different edges throughout the network, and nodelevel feedback, that assumes only to know whether an individual has become active or not, without knowing who influenced it. We now report some information about Combinatorial Multiarmed Bandit (CMAB) that will be useful in the following.
In CMAB framework there are arms, each associated with a random variable, denoted as , indicating the reward of triggering arm on round . is defined on and is independently and identically distributed according to some unknown distribution, with mean . At each round , a set of arms, called superarm and denoted by , is played, triggering all the arms in the set. Moreover, some other arms may be probabilistically triggered. Let be the triggering probability of arm if the superarm A is played (of course, if ). The reward obtained at each round is a function of the rewards of the arms that have been triggered in . Each time an arm is triggered, we observe the rewards and update its mean estimate The superarm that is expected to give the highest reward is selected in each round by an oracle , which takes as input the current mean estimates and outputs an appropriate superarm A.
A Robust Approach The main feature characterizing such new model is that the influence probabilities of the edges belong to intervals rather than being precise values. A classical approach in the literature would be applying learning methods to extract the edge probabilities [24, 42, 48, 49, 55]. However, due to data limitation, no learning method could extract the exact values of the edges probabilities. Thus, the best solution is looking for estimates of such weights together with some confidence intervals. Despite this other approach, the uncertainty affecting the estimates may significantly influence the subsequent influence problem in a negative way. The authors adopt a learning method considering as input for the maximization problem intervals in which the true weight of the edge may lie [6].
To generate a random liveedge graph, we say that some edge is live if flipping a biased random
coin with probability returns success, otherwise is called blocked (with probability ).
Formally, let be the parameter space of , with being its latent parameter vector.
Definition 6 (Robustness ratio).
For a seed set , with , the robust ratio under parameter space is:
where is the number of expected influenced nodes at the end of the spreading process and denotes the optimal solution of size and the probability of every edge is .
We observe that given and a solution , embodies the worstcase ratio w.r.t. the number of influenced nodes between and the optimal placement of seeds in . Now we are ready to formulate the Robust Influence Maximization Problem.
Definition 7 (Robust Influence Maximization Problem (RIMP)).
The problem of Robust Influence Maximization is defined as:

INSTANCE: a graph , weights for each , a parameter space , and an integer .

QUESTION: find a subset , with , such that the robust ratio is maximized, i.e.,
Notice that when there is no uncertainty in the intervals, i.e., collapses on the true probabilities , then RIMP coincides with its exact counterpart, IMP.
We highlight a final remark on the notion of robustness. Beside the meaning we just described, in [27] such a term is adopted w.r.t. the guarantees that an algorithm may provide w.r.t. a spectrum of different diffusion models and parameter settings. We will also deepen this aspect in the following section.
3 Achievements and Results
In this section we present the most important achievements and results developed for IMP and its extensions. Specifically, in Section 3.1, we start reporting the fundamental results, and then in Section 3.2 we focus on how this problem has been solved faster and faster for the different diffusion models. Next, we discuss all the results concerning the other main features characterizing IMP, namely, the seeds placement and the uncertainties that may affect structure of the network, in Sections 3.4 and 3.3, respectively. To facilitate the reader, each paragraph has been named after the algorithm proposed by the various authors to tackle the different problems.
3.1 Fundamental Results
We start by studying the complexity of IMP, and then we will discuss the first algorithm that has been proposed. For the problem to be tractable, two fundamental assumptions are commonly made on the influence function , namely the monotonicity and submodularity.
Definition 8 (Monotonic function).
A function defined over a domain is monotonic if either of the following conditions is satisfied.

If, for all , then .

If, for all , then .
Definition 9 (Submodular function).
A function defined over a domain is submodular if the following condition is satisfied:
for all elements and .
In other words, given two sets such that the first one is included in the second one, it is more significant adding an element to the first set w.r.t. the second, since one additional unit will give a higher contribution if the set it is added to is small.
Notice that the influence function is monotonic by definition, i.e., adding an element to a set cannot cause the influence function to decrease: if we add another placement for a seed, the expected number of influenced nodes cannot decrease, it will stay constant in the worst case. Moreover, the following results hold [30].
Theorem 1.
For any arbitrary instance of the Independent Cascade model, the influence function is submodular.
Theorem 2.
For any arbitrary instance of the Linear Threshold model, the influence function is submodular.
Now, we are ready to state two important theorems.
Theorem 3.
IMP is hard for the Independent Cascade model.
Theorem 4.
IMP is hard for the Linear Threshold model.
In both cases, the reduction is from the Set Cover problem, which reads as follows.
Definition 10 (Set Cover problem).
The decision problem of Set Cover is defined as:

INSTANCE: a set of elements , a collection of sets whose union equals the universe, an integer .

QUESTION: is there a subcollection of whose union equals such that its cardinality is at most ?
Since solving IMP is hard w.r.t. both diffusion models, the next step is understanding whether such a problem could be at least approximated and, if so, up to which factor. For IC
Theorem 5.
For a nonnegative, monotonic, submodular function , let be a subset of size obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let be a set that maximizes the value of over all element sets. Then, .
The above theorem means that provides a approximation. Since we know that both in the IC and in the LT models the influence function is nonnegative, monotonic and submodular, we have a positive approximation result.
Theorem 6.
Let be the current set of nodes in which seeds are put. Adopting a greedy approach to solve IMP, i.e., adding to the node with the highest expected number of influenced nodes, guarantees to obtain an approximate solution to within a factor of .
In other words, starting from the empty set, the algorithm repeatedly adds the node that maximizes . The problem is that it is not clear how to evaluate in polynomial time, and it has actually been proved that such a task is, in general, complete, both for IC and LT models [7]. Nevertheless, we can obtain good approximations of by simulating the random choices and diffusion process a sufficient number of times. Specifically, the ChernoffHoeffding bounds [26] imply the following result [31].
Proposition 2.
If the diffusion process, originated from the seeds placed in , is simulated independently at least times, then the average number of activated nodes over these simulations is a approximation to , with probability at least .
Unfortunately, these results do not for every influence function.
Theorem 7.
In general, it is hard to approximate IMP to within a factor of , for any .
The greedy algorithm has been tested on a graph with nodes and edges between pair of nodes [30] w.r.t. three different models: LT, weighted IC and IC with uniform probability on the edges.
The algorithm has been compared with the baseline of choosing random nodes and two heuristics, the first based on nodes’ degrees and the second on nodes’ centrality in the network. More specifically, the highdegree heuristic chooses node in order of decreasing order , while the second one selects nodes in order of increasing average distance to other nodes in the network.
When the LT diffusion model is adopted, the greedy algorithm outperforms the highdegree node heuristic by and the central node heuristic by . The main reason why both heuristics perform poorly is that they both ignore that most of the central or high degree nodes may be clustered, so that is it a waste of resources putting seeds in all of such nodes.
When the weighted IC model is adopted, all values are than the previous setting, but the trends are the same.
Finally, for the uniform IC mode, has been set equal to . In this case, the network effects in the IC model with very small probabilities are much weaker than in the other models. Several nodes have degrees well exceeding 100, so the probabilities on their incoming edges are even smaller than in the weighted IC model. This suggests that the network effects observed for the LT and weighted IC models rely heavily on lowdegree nodes as multipliers, even though targeting highdegree nodes is a reasonable heuristic.
These results lead us to the fact that knowing the dynamics of a network leads to significantly better results than relying only on the structure of the network itself.
3.2 Solving IMP Faster and Faster
In this section, we analyze the main improvements that have been proposed to enhance the computational speed required to solve IMP. Despite being very popular, the greedy approach adopted in [30] does not scale up to realistic problems. Thus, several heuristics have been designed through the years to deal with this computational issue.
One of the main reasons for the algorithm in [30] of being slow is the computation of the influence function since, as proved in [7] for the Independent Cascade diffusion model, performing such a computation is hard. Such an issue has been tackled by estimating the spread using Monte Carlo simulation or by using heuristics [8, 11, 7, 33].
We now present the main heuristics according to the diffusion model they have been designed for.
Independent Cascade model
Spm/sp1m First, [33] considers two special cases of the Independent Cascade model and provides approximation algorithms to deal with them.

Each node is activated only through the shortest paths from the initial active set. Such a model, called ShortestPath Model (SPM), is a special type of the Independent Cascade model where only the most efficient information spread can occur.

SPM is generalized to SP1 Model (SP1M), where each node has the chance to become active only at steps and , i.e., node cannot be activated excluding the paths from to whose length are equal to or . Here, is the distance between nodes and in the graph.
For both models, they propose a more efficient greedy heuristic than the one proposed in [30], and provide the following result.
Theorem 8.
In the SPM and SP1M, for the greedy algorithm, the following result holds:
where is the element set obtained by the greedy algorithm while is the set that maximizes the value of over all element subsets of .
Celf The other major issue of [30] is that the algorithm is quadratic in the number of nodes. To solve this problem, a first attempt has been made in [35]. Here the authors study the problem of outbreak detection, i.e., selecting nodes in a network to detect the spreading of information as quickly as possible. This is very close to our goal of finding the best placements for seeds spreading the information. They propose CELF (CostEffective Lazy Forward selection): it is based on a lazy evaluation of the objective function, i.e., if the contribution a node brought to the influence function in the previous iteration of the greedy algorithm was already smaller than the contribution given by the current node, then such a node should not be reevaluated since, by submodularity, its contribution can only be lower.
RanCas/NewGreedyIC Next, [8] improves the greedy approach for the Independent Cascade model, and proposes a heuristic that achieved better results than CELF. Moreveor, the authors also introduce a new degree discount heuristics that improves influence spread. Specifically, they propose a novel algorithm to simulate the cascade, called RandomCascade RanCas(). It works as follows: let be the set of vertices that are activated in the th round, with . For any edge such that and has not been activated yet, can be activated by with probability in the th round. If has neighbors, then with probability . The process continues until is empty.
Since in RanCas(S), each edge is determined once, and the probability
on either direction is the same, we could determine whether is selected for propagation or not, and remove all edges not for propagation from to obtain a new graph, say .
Thus, by randomly generating for times, we can select the next best candidate vertex .
In their new algorithm, NewGreedyIC, each random graph is used to estimate the influence spread of all vertices. Comparing NewGreedyIC algorithm with CELF, there is a tradeoff in running time. For CELF, the first round is as slow as the original algorithm, but, starting from the second one, each round may only need to explore a small number of vertices and the exploration of each vertex is typically fast since RanCas(S) usually stops after exploring a small portion of the graph. Conversely, in every round of NewGreedyIC algorithm, we need to traverse the entire graph times to generate random graphs . To combine the merits of both improvements, the authors compose an additional algorithm, called MixGreedyIC, in which in the first round NewGreedyIC is used to select the first seed and compute influence spread estimates for all vertices, and then in later rounds the CELF optimization is adopted to select remaining seeds. This algorithm experimentally shows the best running time.
Mia The next contribution has been provided in [7], where the authors propose a heuristic algorithm that gains efficiency by restricting computations on the local influence regions of nodes.
The rationale behind such an approach is that they use local arborescence structures of each node to approximate the influence propagation. We recall that, in a directed graph, an arborescence is a tree with all edges either pointing toward the root or away from it.
They use the maximum influence paths (MIP) to estimate the influence from one node to another. Let be the set of all paths from to in , be a path and be its propagation probability. Then the MIP can be formally defined as follows.
Definition 11 (Maximum influence path).
The maximum influence path for a graph from node to node is defined as:
Notice that is always unique, and any subpath in from to is also the .
Thus, first MIPs between every pair of nodes are computed by adopting the Dijkstra algorithm [?], ignoring MIPs with probability smaller than some influence threshold . Then, they union the MIPs starting or ending at each node into the arborescence structures, which represent the local influence regions of each node. The authors consider influence propagated through these local arborescences, and we refer to this model as the Maximum Influence Arborescence (MIA) model. The influence spread in the MIA model is submodular, and so the usual greedy algorithm guarantees an influence spread within of the optimal solution. Experimentally compared with the other described heuristics, MIA algorithm is always among the best, and in most cases it significantly outperforms the rest heuristics, with a margin as much as 100260% influence spread. Moreover, the authors show that by tuning , it is possible to adjust the tradeoff between efficiency (in terms of running time) and effectiveness (in term of influence spread).
Celf++ Another algorithm has been proposed in [25], namely CELF++, which improves CELF by inserting additional heuristic optimizations exploiting submodularity. It is empirically 1734% faster than CELF.
Irie In [29] a novel algorithm that outperforms MIA has been designed, IRIE.
The rationale behind this new algorithm is a novel Influence Ranking method, IR, derived from a belief propagation approach, which uses a small number of iterations to generate a global influence ranking of the nodes and then select the highest ranked node as the seed. However, the influence ranking is only good for selecting one seed. If we use the ranking to directly select top ranked nodes as seeds, their influence spread may overlap with one another, and not result in the best overall influence spread. To overcome this issue, the authors integrate IR with a simple Influence Estimation, IE, such that, after one seed is selected, they estimate additional influence impact of this seed to each node in the network, which is much faster than estimating marginal influence for many seed candidates, and then use the results to adjust next round computation of influence ranking. When combining IR and IE together, IRIE is obtained.
An interesting result for IR when applied on tree graphs is the following. For each edge , let us denote with the expected number of activated nodes when and is removed from . Let and be the estimates for and , respectively. Then, we compute the estimates as follows:
The following theorem holds.
Theorem 9.
For any tree graph, for each node , and, for each edge .
Finally, IRIE has another additional feature, namely, it works also on the ICN diffusion model, i.e., the Independent Cascade model in which also negative opinions can spread among the nodes.
Quasi Linear Time Algorithm In [2], the authors provide a significant speedup in computing a solution for IMP under the IC diffusion model, with an approximation factor of , achieving a final complexity of w.r.t. achieved in [8]. Recall that , is the budget for the seeds and is some integer factor. The proposed algorithm is randomized and its basic version succeeds with probability equal to , and, since failure is detectable, such a probability can be increased by repetition.
To understand the approach underlying the algorithm, let us start by considering the problem of finding the single node with highest influence.
Consider the following polling process: extract randomly some node with uniform probability, and determine the set of nodes that would have influenced .
Intuitively, if we repeat this process multiple times, and a certain node appears often as an influencer, then is likely a good candidate for the most influential node.
In fact, the authors show that the probability a node appears in a set of influencers is proportional to the expected number of nodes that will be influenced by , and standard concentration bounds show that this probability can be estimated accurately with relatively few repetitions of the polling process. Moreover, it is possible to efficiently find the set of nodes that would have influenced a node : this can be done by simulating the influence process, starting from , in the transpose graph (i.e., the original network with edge directions reversed).
Thus, the algorithm proceeds in two steps.
First, it repeatedly applies the random sampling technique described above to generate a sparse hypergraph representation of the network. Each hypergraph edge corresponds to a set of individuals that was influenced by a randomly selected node in the transpose graph.
This hypergraph encodes our influence estimates: for a set of nodes , the total degree of in the hypergraph is approximately proportional to the influence of in the original graph.
In the second step, the standard greedy algorithm is run on the hypergraph to return a set of size of approximately maximal total degree.
The final complexity achieved by the algorithm is due to the running time required by the first phase.
Tim Starting from these results, in [56] an even faster algorithm to solve IMP under IC
Definition 12 (Reverse reachable set (RR)).
Let be a node in , and be a graph obtained by removing each edge in with probability. The reverse reachable set for in is the set of nodes in that can reach .
This way, a size node set that covers a large number of sets is derived and then returned as as the final result.
Skim Still willing to achieve a higher scalability, in [15] another approximation algorithm for IMP under IC is proposed. SKIM (SKetchbased Influence Maximization) algorithm works on pernode summary structures, called combined reachability sketches. The sketch of a node compactly represents its influence coverage across instances. The combined reachability sketch of a node is the bottom minhash sketch [14] of the combined reachability set of the node, thus generalizing the reachability sketches of Cohen [13], which have been defined for a single instance. The rationale behind such a concept is that we want to capture the reachability information of the nodes across different instances. Here, the parameter is a small constant that determines the tradeoff between computation and accuracy. Bottom sketches of sets support cardinality estimation, which means that we can estimate the influence of a node or of a set of nodes from their combined reachability sketches. SKIM scales by running the greedy algorithm in sketchspace, always taking a node with the maximum estimated (rather than exact) marginal contribution. SKIM computes combined reachability sketches, but only until the node with the maximum estimated influence is computed. This node is then added to the seed set. Next, the sketches are updated w.r.t. a residual problem in which the node that is selected into the seed set and its influence are no longer present. SKIM resumes the sketch computation, starting with the residual sketches, but stopping when a node with maximum estimated influence is found. A new residual problem is then computed. This process is iterated until the seed set reaches the desired size. Since the residual problem becomes smaller with iterations, we can compute a very large seed set very efficiently.
Another important contribution is the introduction influence oracles: after preprocessing that is almost linear, we can answer influence queries very efficiently, considering only the sketches of the query seed set. Specifically, for instances with nodes and edges, the sketches are built in total time. The influence of a set can then be approximated from the sketches of the nodes in . The oracle applies the union cardinality estimator[16] to estimate the union of the influence sets of the seed nodes. The query runs in time and unbiasedly with a wellconcentrated relative error of . Notice that, while preprocessing depends on the number of instances, the sketch size and the approximation quality only depend on the sketch parameter .
Linear Threshold model
Spin In [40], the authors propose a Shapley value based heuristic, ShaPley value based Influential Nodes (SPIN), for the LT model. However, SPIN only relies on the evaluation of influence spreads of seed sets, and thus does not use specific features of the LT model. Moreover, SPIN is not scalable.
Ldag A significant step has been taken in [11], where it is shown that computing influence in Directed Acyclic Graphs (DAGs) can be done in time linear w.r.t. the size of the graphs. Because of the fast computation in DAGs, the authors propose the first scalable influence maximization algorithm designed for the LT model.
First, they show that computing the exact influence spread in the LT model is hard, even if there is only one seed in the network. This hardness result is important since it closes the open problem left in [30] and further indicates that the greedy algorithm may have intrinsic difficulty to be made more efficient. Then, it is shown that computing influence spread in DAGs can be done in linear time, which relies on an important linear relationship in activation probabilities between a node and its inneighbors in DAGs. Next, based on the fast influence computation for DAGs, the authors propose the first scalable heuristic algorithm tailored for influence maximization in the LT model, namely LDAG. The rationale is to construct a local DAG surrounding every node in the network, and restrict the influence to to be within the local DAG structure. This makes influence computation tractable and fast on a small DAG. To select local DAGs that could cover a significant portion of influence propagation, we propose a fast greedy algorithm adding nodes into the local DAG of a node one by one, such that the individual influence of these nodes to is larger than a threshold parameter. After constructing the local DAGs, we combine the greedy approach of selecting seeds that provide the maximum incremental influence spread with a fast scheme of updating incremental influence spread of every node. The combined fast local DAG construction and the fast incremental influence update make the LDAG algorithm very efficient. Experimental results show that our LDAG algorithm scales to networks with millions of nodes and edges, while the optimized greedy algorithm already take days for networks in the size of 64K. In term of influence spread, LDAG algorithm is always very close to that of the greedy algorithm in all test cases, showing that it is able to achieve the same level of influence spread while running in orders of magnitude faster than the greedy algorithm.
Dynamic Diffusion model
FastMargin In [62], the authors prove that IMP under DynaDiffuse is NPhard. Moreover, they prove that, also in this model, the influence function is monotonic and submodular, thus the standard greedy algorithm provided achieves the usual approximation factor of . However, here a faster algorithm is provided. First it predicts a relatively precise influence spread using stochastic model checking on the CTMC Parallel Composition. Then, it predicts the increased marginal influence spread of adding each when selecting an additional node to be added to the current initial set. To improve the efficiency, they estimate this marginal influence spread using a faster discounted formula for instead of determining a precise value with stochastic model checking.
For scalability to large networks, they also adopt the CELF lazy evaluation approach [35], which dramatically reduces the number of evaluations of . Such an algorithm can be flexibly used in any continuoustime constrained diffusion model that assumes that the diffusion rate follows the exponential distribution. The total complexity is , where is the maximum degree of all nodes and is the set of states.
Heat Diffusion model
Cim Adopting the Heat Diffusion Model, in [12] a novel technique is employed to detect the most promising nodes that will maximize the spread throughout the network. They propose a threelevel approach based on the idea of community. The three phases of Communitybased Influence Maximization (CIM) are community detection, candidate generation, and seed selection. Let us analyze them.
First, the authors argue that nodes in social networks naturally cluster together. To reduce the overhead incurred in computing influence spreads based on HDM, they explore the clustering phenomenon among nodes in a community to avoid the computation of overlapped influence spreads among nodes in the same community. In other words, once a node in a community is selected, it is very unlikely that selecting another node belonging to the same community results being better than placing a seed in a node from a different one.
For the clustering algorithm, H_Clustering, they incorporate the notion of modularity, as expressed in [43], based on a bottomup approach to iteratively merge nodes with strong structure similarity into communities. First, for each node in the given social network, the algorithm derives the structural similarity between the node and its neighboring nodes, where the structure similarity is used as the edge weight for its neighboring nodes. Then, each node is considered as a community, and groups each pair of nodes into a community if the structural similarity between these two nodes is the largest among their surrounding edges from each other. In other words, given two nodes and , if the edge is the largest among all edges connecting to and also is the largest among all edges connecting to , merge and into a community. Next, each newly created community becomes a node, and the process continues until a termination condition is reached. Such a termination condition should measure the quality of discovered communities in order to decide when to stop the community detection process. This is why, as done in [28], they adopt modularity gain [22]. To reduce the search space of seed nodes, one way could be analyzing the community structures and prune off insignificant communities and their nodes. Finally, even though we may want to avoid placing multiple seeds in the same community, the size of community is still a factor for seed placement, i.e., communities should not be treated the same since placing seeds in a large community could trigger more nodes than in a small community.
To select the seeds from the large communities just built, the centroids of communities may appear as the natural candidates. However, also hubs, i.e., nodes connecting different communities, should be carefully considered. A community is called significant if it has have the number of nodes larger than the average number of nodes a seed may influence in a given influence maximization task. In order to generate a small set of seed candidates, he approach is to consider only the potential centroid nodes (i.e., nodes with high degree and large score sum) in significant communities and the hub nodes as candidates by eliminating outliers and nodes in some insignificant communities. Then, the top of highdegree nodes and largescoresum nodes in each significant community and all hub nodes connecting significant communities as the candidate set.
Finally, we should select seeds out of all the candidates. For various combinations of seeds selected from the candidate seed set, the computation of influence spreads is carried out by simulating how influence spreads from those seeds based on the heat diffusion model. To overcome the computational issues that arise while performing this task, the authors designed a twostep approach. First, a quotabased method is employed to determine the number of seeds to be allocated for a given significant community and then determine which nodes should be selected for this significant community based on a heuristic of position score and hub purity. Then, in the second step, CIM heuristically finds a new candidate seed which may potentially increase the influence spread to swap with a seed node, aiming to obtain a better seed set, basically applying a local search approach. This process is repeated until the influence spread does not improve any more or a certain number of seed node swapping has been performed.
3.3 Seeds Placement
We now turn to the results concerning new features that have been added to IMP w.r.t. the seeds placement.
FindHighDegreeVertex We restrict possible operations that may be performed on the graph only to jump and crawl, adopting such a number as the complexity measure. Recall that the goal is to find interesting individuals in the network both in terms of degree and CC.
Given , we want to find a vertex such that: , with being the maximum degree of the different nodes and . To find such a vertex, one path could be the following: if is smaller than , then any vertex would satisfy our condition. If this does not hold, we observe that the expected size of a random sample we need to take to obtain a neighbor with maximum degree is . Thus, we can just sample nodes: if one has a degree higher than , otherwise such a vertex is the neighbor of a node with degree .
FindHighDegreeVertex works assuming to know the value of : to remove such a constraint, we just simulate the possible values of in a logarithmic fashion. For such an algorithm, the following result holds.
Theorem 10.
For any , FindHighDegreeVertex approximates the maximum degree to an expected multiplicative factor of performing jump and crawl operations.
Even more interestingly, the authors show that the algorithm is optimal up to a logarithmic factor, as stated in the next result.
Theorem 11.
For any , if some algorithm performs at most jump and crawl operations, then approximates the maximum degree to an expected multiplicative factor of .
Next, we consider graphs with the power law property since the degree distribution of many networks resembles such a behavior. Intuitively, by power law we mean that the fraction of vertices with degree is close to , with and big enough. We can state the following.
Theorem 12.
Let , with . Then FindHighDegreeVertex can be adapted so that it performs jump and crawl operations, approximating the maximum degree to an expected multiplicative factor of .
The last step is considering networks created by the preferential attachment process of Barabasi [1].
Adapting FindHighDegreeVertex for such cases, exploiting the idea of lazy random walk,
Theorem 13.
Let . Then, there exists a modification of FindHighDegreeVertex that approximates up to an expected multiplicative ratio of performing jump and crawl operations.
We are now ready to state the last two results.
Theorem 14.
For any , there exists an algorithm returning a approximation to maximum CC, performing operations.
Theorem 15.
For any , in networks following a power law with and such that the lazy random walk mixes in time , there exists an algorithm returning a approximation to maximum CC, performing operations.
AdaptiveSeeding The main idea is to have access to a small number of nodes, and then explore only the neighbors of such nodes, which can be accessed with some probability after one of their neighbor is activated. Moreover, at most seeds can be placed in total [50]. Let us start by tackling problem of stochastic optimization, i.e., which should be the first nodes to be selected as starting seeds.
One of the most common techniques in the literature is the Sample Average Approximation method [34], whose rationale is to sample instances from the product distribution over the set of neighbors and then optimize the influence function w.r.t. such sampled case. If such a function is simple, e.g., additive, then such an approach could be directly extended and when the function is also submodular, the approximation result proved in [41] holds.
However, if is complex, the only solution is designing a non adaptive approach, which commits to the nodes before both stages are realized. Of course, such policies will be significantly weaker than the adaptive ones. The algorithmic challenge arises in attempting to optimize the influence function of the Triggering model. Interestingly, any influence function in the Triggering model can be adaptively seeded, i.e., there is an algorithm which finds an adaptive policy that is a constant factor approximation to the optimal adaptive policy, for any influence function in this class. The main idea is to design an algorithm which approximates the optimal randomizedandrelaxed non adaptive policy and show that its solution is a good approximation to the optimal adaptive policy. First a concave function is constructed and the problem is formulated as optimizing the concave function under a mixed constraint of integral and fractional linear constraints and show that solutions to this objective are an upper bound on adaptive policies.
Then, the authors present an algorithm which mimics a gradientascent process, taking steps in the direction of the densest gradient, showing such an algorithm obtains a constant factor approximation to the optimal adaptive policy.
3.4 Network Uncertainties
Finally, we look at the main results obtained when considering IMP formulated on networks for which we do not have all the information.
Imug In [39], we are given the number of nodes constituting the network, but we do not have any information either on the existence or on the probabilities associated to the different edges. The goal here is to reconstruct a network that is as close as possible to the real connections among the individuals, and then solve IMP.
There are rounds, and in each round we can probe nodes, select seed nodes, and trigger influence spread from the selected seeds.
The selected seed nodes become activated, spread influence to their adjacent node, and each newly activated (influenced) node recursively repeats the influence spread process
according to a given diffusion model.
The proposed heuristic algorithm is called Influence Maximization for Unknown Graphs (IMUG). The rationale behind it is to greedily probe the node with the highest expected degree from unprobed nodes, and greedily select the inactive node with the highest expected degree as a seed node by using the results of past probing and influence spread. In these cases, we have to rely on available local information about node degrees, being not possible to obtain complete knowledge of the entire topological structure of the graph. Using node degree is a straightforward solutions in situations of limited information since highdegree nodes tend to spread more influence than lowdegree nodes do, and the effectiveness of degreebased heuristic algorithms have been successfully shown [8]. Note that the degree of each node is not available in the problem studied here, and therefore it is estimated from the results of past probing.
As a probing strategy, the authors adopt a biased sampling strategy called Sample Edge Count (SEC), also known as a snowball sampling strategy [38]. SEC greedily probes the node with the highest expected degree. Given a set of already probed nodes , SEC estimates the expected degree of node in the original graph as the degree of node in the induced subgraph of , and probes the node with the highest expected degree. For each round, we repeat SEC probing times.
In the initial state, IMUG estimates each node expected degree , which are IMUG parameters. If some knowledge of node degrees, such as the average degree, is available in advance, we can determine s by using that knowledge. When we have no information about node degree, a simple option is to set to . In each round, IMUG updates the expected degree of each node by using the results of SEC probing. When node is probed, since the true degree of node is known, it is fixed to the known true degree. Moreover, for each node adjacent to node , since node is revealed to have at least one link, is incremented by one unless the degree of node is fixed.
Despite being a heuristic, IMUG achieves a 6090% influence spread as compared with algorithms using the entire social network topology even when only 110% of such a topology is known.
Arisen In [60], ideas from both [3] and [39] are combined. The authors propose the exploratory influence maximization, looking for an algorithm that makes a small number of queries and returns a set of seed nodes which are approximately as influential as the globally optimal seed set. The approach is not comparable with [39] since the algorithm observes all of the nodes which are activated by its chosen seeds, which can reveal a great deal about the network. As in [3], they adopt the jump and crawl operations, when visited, a node reveals all of its edges, the query cost of an algorithm is the total number of nodes visited using either operation and the goal is to find influential nodes with a query cost that is much less than the total number of nodes.
The graphs adopted for he analysis are drawn from the Stochastic Block Model (SBM), which originated in sociology [23]. In the SBM, the network is partitioned into disjoint communities . Each withincommunity edge is present independently with probability and each betweencommunity edge is present independently with probability . Each community is internally drawn as with additional random edges to other communities.
ARISEN (Approximating with Random walks to Influence a Socially Explored Network) takes as input the parameters , but is not given any prior information about the realized draw of the network. The rationale behind ARISEN is to sample a set of random nodes from and explore a small subgraph around each by taking R steps of a random walk. and are inputs: should be greater than so we can be sure of sampling each of the largest communities. The subgraphs are used to construct a weight vector where gives the weight associated with . The algorithm then independently samples each seed from with probability proportional to .
ARISEN uses the random walk around each to estimate the size of the community that lies in. From these estimates, it constructs a that, in expectation, seeds the largest communities. Then, it tests if a that puts more weight on large communities would increase the expected influence.
CMAB for IM In [58], the authors assume that the influence probabilities are not known and it is not possible to estimate them just by data. This is why they propose to learn them online, employing CMAB. The authors study the problem adopting the IC, even though the provided framework is valid for any discrete time diffusion model. The actual spread is the number of nodes reachable from the selected seed nodes in the true possible world and we denote it by . The mapping between CMAB and IMP is the following:

associate an arm to each edge ;

reward represents the status of the edge, which can be either live, i.e., an attivation attempt along it succeeded, or dead;

mean is the weight of the edge corresponding to arm ;

superarm A is associated to the union of outgoing edges from the seeds ;

reward in round corresponds to the spread in the th diffusion attempt.
In each round, the regret minimization algorithm selects a seed set cardinality and plays the corresponding superarm . can be selected either randomly or by solving IMP with the current influence probability estimates Oracle takes as input the graph the estimates of the means, and outputs a seed set . For the case of IMP, the oracle constitutes a approximation oracle [10].
Once the superarm is played, information diffuses in the network and a subset of network edges become live which leads to a subset of nodes becoming active. The reward for these edges is . The reward is the number of active nodes at the end of the diffusion process and is thus a nonlinear function of the rewards of the triggered arms. After observing a diffusion, the mean estimate needs to updated. In this context, the notion of a feedback mechanism plays an important role since it characterizes the information available after a superarm is played. This information is used to update the model to improve the mean estimates. Let be the solution and let the optimal expected spread. Since IMP is hard, even if the true influence probabilities are known, we can only hope to achieve an expected spread of , where and . Call the seed set chosen by in round . The regret incurred by is then defined by:
where the expectation is over the randomness in the seed sets output by the oracle. The usual feedback mechanism is the edgelevel feedback proposed by [10], where we assume that we know the status of each triggered edge in the true possible world. The mean estimates of the arms distributions can then be updated as follows:
However, edgelevel feedback is often not realistic because success/failure of activation attempts is not generally observable. Unlike the status of edges, it is quite realistic and intuitive that we can observe the status of each node. While this is a more realistic assumption, the disadvantage of the nodelevel feedback is that updating the mean estimate for each edge is more challenging. This is because we do not know which active parent activated the node, or when it was activated. Under edgelevel feedback, we assume that we know the status of each edge starting from the neighbors of some node , and use it to update mean estimates. Under nodelevel feedback, any of the active parents may be responsible for activating a node and we donât know which. The authors provide two ways to solve this issue.
The most common way to infer the edge probabilities given the status of each node in the cascade is to use Maximum Likelihood Estimation (MLE). They use an MLE formulation similar to those proposed in [42, 49]. These works describe an offline method for learning influence probabilities, where a fixed set of past diffusion cascades is given as input. The loglikelihood function for a given set of cascades is given by:
where models the likelihood of observing the cascade w.r.t. node , given the influence probability estimates .
The authors then provide a bound,
Unfortunately, the time complexity of the this approach is , which does not scale to networks with a large number of edges. To mitigate this, the authors adapt a result from online convex optimization for learning the edge probabilities, in which an online convex optimization framework has been developed to minimize a sequence of convex functions over a convex set [65]. In such a case, they solve an online convex optimization problem for each node in the network. It is shown that with sufficiently many rounds , the parameters learned by the online MLE algorithm are nearly as good as those learned by the offline algorithm.
The other approach is the following. In typical social networks, the influence probabilities are very small and this causes the number of active parents to be small, too. The authors propose a scheme where we choose one of the active neighbors of , say , uniformly at random, and assign the credit of activating to . The probability of assigning credit to any one of active parents is . In other words, edge is given a reward of 1 whereas edges corresponding to other active parents are assigned a zero reward. Then the same approach of the edgelevel feedback is adopted. Observe that we could make a mistake by inferring an edge to be live while it is dead in the true world or vice versa. We term the probability of such faulty inference the failure probability under nodelevel feedback. An important question is whether we can bound this probability. This is important since failures could ultimately affect the achievable regret and the error in the learned probabilities. As the number of active nodes for a cascade increases, the error in the mean estimates increases and it is better to use the maximum likelihood approach for credit distribution. The authors empirically find that the proposed nodelevel feedback achieves competitive performance compared to edgelevel feedback.
LUGreedy The second way of dealing with uncertainties on social networks is by studying a robustness framework. In [6], for RIMP, the knowledge of the confidence interval assumed to be the input. When the amount of observed information cascade is small, the best robust ratio for the given can be low so that the output for a RIM algorithm does not have a good enough guarantee of the performance in the worst case. Then a natural question is the following: given , how to further make samples on edges so that can be efficiently improved?
Consider RIM, , the true probability to be unknown. Let us denote and . We have the following result.
Theorem 16.
RIMP is hard and, for any , it is hard to find a seed set with a robust ratio at least equal to .
Then, the authors propose LowerUpper Greedy (LUGreedy) algorithm, which outputs the best seed set for such that:
To evaluate the performance of this output, we first define the gap ratio of the input parameter space to be:
Then, LUGreedy achieves the following result.
Theorem 17.
Given a graph , parameter space and budget limit , LUGreedy outputs a seed set of size such that:
Notice that the worsecase bound could be small if is not assumed to be tight enough. Moreover, the best possible robust ratio can be too low so that the output for RIMP could not provide us with a satisfying seed set in the worst case. However, according to the Chernoff’s bound, the more samples we make on an edge, the narrower the confidence interval we get that guarantees the true probability to be located within the confidence interval with a desired probability of confidence. Thus, after sampling to get a narrower parameter space, we could use LUGreedy algorithm to get the seed set. The authors propose two ideas exploiting properties of additive and multiplicative confidence interval respectively to this issue and incorporate into Uniform Sampling algorithm.
The first idea is that we may sample every edge for sufficient times to shrink their confidence intervals in , and feed LUGreedy with as same as solving RIMP, then the performance is guaranteed by the factor we found for LUGreedy. The second idea is to use the multiplicative confidence interval to reduce the fluctuation of influence spread, then LUGreedy still applies. The ratio of influence spread can be bounded based on the relation of and in the multiplicative form. To unify both ideas mentioned above, they propose Uniform Sampling for RIM algorithm, which samples every edge with the same number of times, and use LUGreedy to obtain the seed set.
Finally, the authors deal with instances in which the influence probabilities are different. The intuition for nonuniform sampling is that the edges along the information cascade of important seeds determine the influence spread, and thus they should be estimated more accurately than other edges not along important information cascade paths. Starting from the seed set , once node is activated, it will try to activate its outneighbors. That is, for every outedge of , denote as the number of samples, then will be sampled once to generate a new observation based on the latent Bernoulli distribution with success probability , and will be increased by . The process goes on until the end of the information cascade. To solve such a problem, the authors propose Information Cascade Sampling for RIM algorithm, which adopts information cascade sampling described above to select edges.
SaturateGreedy The other sense in which the word robust has been used is w.r.t. the guarantees that an algorithm may provide w.r.t. a spectrum of different diffusion models and parameter settings [27]. To model this uncertainty, the authors assume that the algorithm is presented with a set of influence functions, and assured that one of these functions actually describes the influence process, but not told which one. The set could be finite or infinite. Since the algorithm does not know , it must simultaneously optimize for all objective functions in , in the sense of maximizing:
where is an optimal solution knowing which function is to be optimized.
First, the authors show that, unless the algorithm gets to exceed the number of seeds by at least a factor , approximating the objective to within a factor is hard for all .
However, when the algorithm does get to exceed the seed set target by a factor of , much better bicriteria approximation guarantees can be obtained. We recall that a bicriteria algorithm gets to pick more nodes than the optimal solution, but is only judged against the optimum solution with the original bound on the number of nodes.
Specifically, it is shown that a modification of an algorithm of [35] uses seeds and finds a seed set whose influence is within a factor of optimal. Moreover, two heuristics are investigated.

Run a greedy algorithm to optimize directly, picking one node at a time.

For each objective function , find a set (approximately) maximizing . Evaluate each of these sets under , and keep the best one.
From an experimental perspective, they exhibit instances on which both of the heuristics perform very poorly. Then, they focus on more realistic instances , exemplifying the types of scenarios under which robust optimization becomes necessary. The main outcome of the experiments is that while the algorithm with robustness as a design goal typically (though not even always) outperforms the heuristics, the margin is often quite small. Hence, heuristics may be viable in practice, when the influence functions are reasonably similar.
4 Deploying IMP on the Field
Social influence maximization has been successfully deployed on the field in different ways [57], e.g., improving nutrition [32], reducing smoking [52] and reducing HIV spread [45, 64]. Due to the significant increase of importance of the latter issue, in this section we will focus on such an application.
4.1 Background
Homeless youth are twenty times more likely to be HIV positive than stably housed youth [17]. To reduce rates of HIV infection among youth, many homeless youth service providers conduct peerleader based social network interventions [45], where a selected group of homeless youth are trained as peer leaders. This peerled approach is desirable because service providers have limited resources and homeless youth tend to distrust adults. The training program of these peer leaders includes detailed information about how HIV spreads and what one can do to prevent infection. They are also taught effective ways of communicating this information to their peers [46]. Because of the limited financial and human resources, service providers can only train a small number of these youth and not the entire population. As a result, the selected peer leaders in these intervention trainings are tasked with spreading messages about HIV prevention to their peers in their social circles. Using these interventions, service providers aim to leverage social network effects to spread information about HIV, and induce behavior change among more and more people in the social network of homeless youth. Unfortunately, there are further constraints that service providers face, so that they can only train 34 peer leaders in every intervention. This leads us to do sequential training, where groups of 34 homeless youth are called one after another for training, where they are also asked about friendships that they observe in the realworld social network. This information is then used to improve the selection of the peer leaders for the next intervention. As a result, the peer leaders for these limited interventions need to be chosen strategically so that awareness spread about HIV is maximized in the social network of homeless youth.
4.2 Dynamic Influence Maximization under Uncertainty
We now present two of the most interesting approaches developed to tackle the problem of IMP for HIV prevention. They consider possible uncertainty w.r.t. the existence of the edges, i.e., each edge has an associated probability of existing or not. If an edge exists, then it will have the usual weight indicating the strength of the relationship between the connected individuals.
Let be the set of possible actions that the agent can recommend at every time step . For all , let denote the agentâs chosen action in the th time step. Upon taking action , the agent observes uncertain edges adjacent to nodes in , which updates its understanding of the network. Moreover, let denote the uncertain network resulting from with observed (additional edge) information from . For all , we define a history of length as a tuple of past choices and observations: . Denote by the set of all possible histories of length less than or equal to . Finally, we define an step policy as a function that takes in histories of length less than or equal to and outputs a node choice for the current time step. As per the diffusion model, they adopt a variant of the IC model, so that nodes get multiple chances to influence their uninfluenced neighbors.
We are now ready to provide the formal definition of the Dynamic Influence Maximization under Uncertainty (DIME) [63].
Definition 13 (DIME problem).
Consider as input an uncertain network and integers for the number of rounds and for the cardinality of the possible leaders that may be selected. Denote by the expected total number of influenced nodes at the end of stage , given the length history of previous observations and actions , along with , the action chosen at time . Let denote the expectation over the random variables and , where are chosen according to , for every , and are drawn according to the distribution over uncertain edges of that are revealed by . The goal of DIME is to find an optimal step policy .
Healer Hierarchical Ensembling based Agent which pLans for Effective Reduction in HIV Spread [63] is a software agent that casts the DIME problem as a Partially Observable Markov Decision Process (POMDP) [44] to compute a Tstep online policy for selecting nodes for stages. POMDPs are a valuable approach for three reasons. First, service providers select different subsets of nodes sequentially. Each subset of nodes is mapped to a unique POMDP action. Then, the service providers do not see the exact network state at any given point in time. HEALER maps each POMDP state to indicate which node is already influenced and which node is not. Finally, the observation received by service providers about edges connected to peer leaders is analogous to the observations received in POMDPs. HEALER utilizes hierarchical ensembling techniques, creating ensembles of smaller POMDPs at two different levels. First, the original POMDP is divided into several smaller intermediate POMDPs using graph partitioning techniques. Next, each intermediate POMDP is further subdivided into several smaller sampled POMDPs using graph in parallel using novel online planning methods, so that each sampled POMDP executes a Monte Carlo tree search [51] to select the best action in that sampled POMDP. The solutions of these sampled POMDPs are combined to form the solution of the intermediate POMDPs. Similarly, the solutions of the intermediate POMDPs are combined to form the solution of the original POMDP.
HEALER consists of two major components: a network generation application for gathering information about social networks and a DIME Solver.
The Network Generation Application gathers information about social ties in the homeless youth social network by interacting with youth via a network generation application. Once a fixed number of homeless youth register in the network application, HEALER parses the contact lists of all the registered homeless youth on social media and generates the social network between these youth. HEALER adds a link between two people if and only if both people are contacts on social media, and are registered in its network generation application. Unfortunately, there is uncertainty in the generated network as friendship links between people who are only friends in reallife are not captured by HEALER.
The DIME Solver takes as input the approximate social network previously generated, and solves the DIME problem using HEALER algorithm, which provides the solution of this DIME problem as a series of recommendations to homeless shelter officials.
Dosim The Double Oracle for Social Influence Maximization [61] is a novel algorithm that solves a generalization of the DIME problem. The key motivation behind DOSIM is to be able to select actions, i.e., set of nodes, for stages without knowing the exact model parameters. HEALER dealt with this issue by assuming a specific value based on suggestions by service providers. DOSIM works with interval uncertainty over the parameters related to the edges, i.e., the existence probability and the weight of the edge. This generalizes the model used by HEALER to include higherorder uncertainty over the probabilities in addition to the uncertainty induced by the probabilities themselves. DOSIM chooses an action which is robust to this interval uncertainty. Specifically, it finds a policy that achieves close to optimal value regardless of where the unknown probabilities lie within the interval. The problem is formalized as zero sum game between the algorithm, which picks a policy, and a naturetype adversary who chooses the model parameters. This game formulation represents a key advance over HEALERâs POMDP policy since it enables DOSIM to output mixed strategies over POMDP policies, making it robust against worstcase propagation probability values. Moreover, DOSIM receives periodic observations that are used to update its understanding of its belief state, i.e., probability distribution over different model parameters. The strategy space for the game is intractably large because there are an exponential number of policies. Thus, DOSIM uses a double oracle approach: by iteratively computing best responses for each player, DOSIM finds an approximate equilibrium of the game without having to enumerate the entire set of policies.
Deployment Process We now describe the approach that has been followed to design and test the HEALER and DOSIM algorithms in a real case study.
During the recruitment, the youth take a 20 minute baseline survey, which allows to determine their current risktaking behaviors (e.g., they are asked about the last time they got an HIV test, etc.). After recruitment, the friendship based social network that connects these homeless youth is generated. The authors rely on two information sources to generate this network: online contacts of homeless youth and field observations made by the authors and service providers. Online contacts of homeless youth are used to build a first approximation of the realworld social network of homeless youth. Then, this network is refined using field observations made by the authors and the service providers. All edges inferred in this manner are assumed to be certain edges.
Next, the generated network is used by the software agents to select actions for stages. In each stage, an action is selected using the pilotâs intervention strategy. The peer leaders of this chosen action are then informed about HIV by pilot study staff during the intervention. These peer leaders also reveal more information about newer friendships, which are incorporated into the net work so that the agents can select better actions in the next stage of interventions. The follow up phase consists of meetings, where the peer leaders are asked about any difficulties they faced in talking to their friends about HIV.
Then, during inperson surveys, they are asked if some youth from within the pilot study talked to them about HIV prevention methods, after the pilot study began. Their answer helps determine if information about HIV reached them in the social network or not. Thus, these surveys are used to find out the number of youth who got informed about HIV as a result of our interventions. They are also asked to take the same survey about HIV risk that they took during recruitment. These postintervention surveys allow to compare HEALER, DOSIM and Degree Centrality (DC) [59] in terms of information spread, i.e., how successful were the agents in spreading HIV information through the social network, and behavior change, i.e., how successful were the agents in causing homeless youth to test for HIV. The experiments showed that HEALER and DOSIMâs strategies were able to improve over DCâs information spread by over . Moreover, peer leaders chosen by HEALER and DOSIM converted and , respectively, of the youth to HIV testers, while he ones chosen by DC did not convert any youth to testers.
5 Conclusions and Future Research
In this survey we presented the main results achieved for the influence maximization problem. We started by introducing the main elements characterizing a social network and then formally defined the influence maximization problem, i.e., the problem of finding the smallest number of individuals in the network able to influence the greatest number of them.
We discussed the main contributions that have been developed w.r.t. the principal features characterizing the problem, namely, the diffusion model, the way of placing seeds from which the influence diffuses, and uncertainties that may affect the structure of the network. Then, for each case, we reported the most important theoretical results and algorithmic solutions that have been designed, showing how they evolved and have been adapted to various scenarios. Finally, we presented a real case in which such techniques have been exploited to maximize HIV awareness in youth by identifying natural leaders in different communities. The proposed algorithms have a remarkable behavior w.r.t. the baseline on synthetic data, and showed their true effectiveness also for the pilots that have been conducted.
Next works in this field could follow different directions. Beside further improving the scalability of the algorithms, novel additional features could be integrated either in the diffusion models or in the ways seeds are placed or accessed in the network. Independently by the direction that is explored, we believe that future research lines will be strongly driven by applications since, now more than ever, social networks could raise people’s attention to crucial issues, and truly informing people, hoping that in the future we will employ such tools more and more often for social good.
Appendix A Notation Table
We report in Table 1 the main symbols used throughout the paper.
Symbol  Meaning  

Model 
Graph  
Set of vertices  
Cardinality of  
Vertex  
Set of edges  
Cardinality of  
/  Edge  
/  Influence probability of edge /  
Set of seeds  
Seed  
Cardinality of the initial set of seeds  
Optimal set of seeds of cardinality  
Influence function  
Expected number of nodes influenced by seeds  
Time instant  
Activation threshold of node  
Degree of vertex  
Highest degree of the vertices in the network  
Clustering coefficient of vertex  
Acronyms 
IMP  Influence maxmization problem 
RIMP  Robust Influence maxmization problem  
IC  Independent Cascade diffusion model  
LT  Linear threshold diffusion model 
Footnotes
 Throughout the paper, we will indifferently adopt either or to denote the influence probability of w.r.t. , for each .
 For the sake of presentation, here we will introduce only the heat diffusion model for undirected graphs. See [37, Sections 3.2–3.3], for further details on directed graphs with and without prior knowledge of the diffusion probability.
 Since we defined the Heat Diffusion Model on undirected graphs, the outgoing degree of some node is equal to its ingoing degree.
 Please, refer to [62, Section 4] for more details on CTMCs.
 As the careful reader may notice, this is exactly the same terminology we employed to describe the Triggering Diffusion model in 2.2.3.
 The reader can observe the similarities between SCP and IMP. For more details, please refer to [30].
 All the hardness and approximability results reported here for IC also holds for ICN since, also for this model, the influence function is nonnegative, monotonic, and submodular.
 For more details, please see [8, Section 2].
 The formula for the complexity of the algorithm reported above is the one corresponds to the version of the algorithm for which the probability success has been amplified beyond .
 Besides working when IC is adopted as diffusion model, TIM also supports the Triggering Diffusion model.
 A lazy random walk on a connected graph prescribes to stay in the current vertex with probaility and to move to a uniformly chosen random neighbor with probability . Such a random walk forms an ergodic Markov chain.
 We recall that corresponds to the usual bigO notation but ignoring any logarithmic factor and that the mixing time of a Markov chain is the time until the Markov chain is close enough to its steady state distribution.
 The authors provide an algorithmic scheme that can be adopted with both the IC and LT diffusion models.
 For more details, please see in [58, Theorem 1]
References
 AlbertLászló Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
 Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier. Maximizing Social Influence in Nearly Optimal Time. In ACMSIAM Symposium on Discrete Algorithms, pages 946–957, 2014.
 Michael Brautbar and Michael J. Kearns. Local Algorithms for Finding Interesting Individuals in Large Networks. 2010.
 Sébastien Bubeck, Nicolò CesaBianchi, et al. Regret Analysis of Stochastic and Nonstochastic Multiarmed Bandit Problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
 Wei Chen, Alex Collins, Rachel Cummings, Te Ke, Zhenming Liu, David Rincon, Xiaorui Sun, Yajun Wang, Wei Wei, and Yifei Yuan. Influence Maximization in Social Networks when Negative Opinions May Emerge and Propagate. In SIAM International Conference on Data Mining, pages 379–390, 2011.
 Wei Chen, Tian Lin, Zihan Tan, Mingfei Zhao, and Xuren Zhou. Robust Influence Maximization. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 795–804, 2016.
 Wei Chen, Chi Wang, and Yajun Wang. Scalable Influence Maximization for Prevalent Viral Marketing in Largescale Social Networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1029–1038. ACM, 2010.
 Wei Chen, Yajun Wang, and Siyu Yang. Efficient Influence Maximization in Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining, pages 199–208, 2009.
 Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial Multiarmed Bandit: General Framework and Applications. In International Conference on Machine Learning, pages 151–159, 2013.
 Wei Chen, Yajun Wang, Yang Yuan, and Qinshi Wang. Combinatorial Multiarmed Bandit and its Extension to Probabilistically Triggered Arms. The Journal of Machine Learning Research, 17(1):1746–1778, 2016.
 Wei Chen, Yifei Yuan, and Li Zhang. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In International Conference on Data Mining, pages 88–97, 2010.
 YiCheng Chen, WenYuan Zhu, WenChih Peng, WangChien Lee, and SuhYin Lee. CIM: Communitybased Influence Maximization in Social Networks. ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):25, 2014.
 Edith Cohen. Sizeestimation Framework with Applications to Transitive Closure and Reachability. Journal of Computer and System Sciences, 55(3):441–453, 1997.
 Edith Cohen. Alldistances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis. IEEE Transactions on Knowledge and Data Engineering, 27(9):2320–2334, 2015.
 Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck. Sketchbased Influence Maximization and Computation: Scaling up with Guarantees. In ACM International Conference on Information and Knowledge Management, pages 629–638, 2014.
 Edith Cohen and Haim Kaplan. Leveraging Discarded Samples for Tighter Estimation of Multipleset Aggregates. In ACM SIGMETRICS Performance Evaluation Review, volume 37, pages 251–262, 2009.
 NH Council. HIV/AIDS among Persons Experiencing Homelessness: Risk Factors, Predictors of Testing, and Promising Testing Strategies, 2012.
 Pedro Domingos. Mining Social Networks for Viral Marketing. IEEE Intelligent Systems, 20(1):80–82, 2005.
 Pedro Domingos and Matt Richardson. Mining the Network Value of Customers. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 57–66, 2001.
 Usama Fayyad, Gregory PiatetskyShapiro, and Padhraic Smyth. From Data Mining to Knowledge Discovery in Databases. AI magazine, 17(3):37, 1996.
 Scott L. Feld. Why Your Friends Have More Friends than You Do. American Journal of Sociology, 96(6):1464–1477, 1991.
 Zhidan Feng, Xiaowei Xu, Nurcan Yuruk, and Thomas AJ Schweiger. A Novel Similaritybased Modularity Function for Graph Partitioning. In International Conference on Data Warehousing and Knowledge Discovery, pages 385–396, 2007.
 Stephen E. Fienberg and Stanley S. Wasserman. Categorical Data Analysis of Single Sociometric Relations. Sociological Methodology, 12:156–192, 1981.
 Amit Goyal, Francesco Bonchi, and Laks V. S. Lakshmanan. Learning Influence Probabilities in Social Networks. In ACM International Conference on Web Search and Data Mining, pages 241–250. ACM, 2010.
 Amit Goyal, Wei Lu, and Laks V. S. Lakshmanan. CELF++: Optimizing the Greedy Algorithm for Influence Maximization in Social Networks. In International Conference Companion on World Wide Web, pages 47–48. ACM, 2011.
 Michel Habib, Colin McDiarmid, Jorge RamirezAlfonsin, and Bruce Reed. Probabilistic Methods for Algorithmic Discrete Mathematics, volume 16. Springer Science & Business Media, 2013.
 Xinran He and David Kempe. Robust Influence Maximization. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 885–894, 2016.
 Jianbin Huang, Heli Sun, Jiawei Han, Hongbo Deng, Yizhou Sun, and Yaguang Liu. SHRINK: A Structural Clustering Algorithm for Detecting Hierarchical Communities in Networks. In ACM International Conference on Information and Knowledge Management, pages 219–228, 2010.
 Kyomin Jung, Wooram Heo, and Wei Chen. IRIE: Scalable and Robust Influence Maximization in Social Networks. In IEEE International Conference on Data Mining (ICDM), pages 918–923, 2012.
 David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the Spread of Influence through a Social Network. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137–146, 2003.
 David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the Spread of Influence through a Social Network. Theory of Computing, 11(4):105–147, 2015.
 David A. Kim, Alison R. Hwong, Derek Stafford, D. Alex Hughes, A. James O’Malley, James H. Fowler, and Nicholas A. Christakis. Social Network Targeting to Maximise Population Behaviour Change: A Cluster Randomised Controlled Trial. The Lancet, 386(9989):145–153, 2015.
 Masahiro Kimura and Kazumi Saito. Tractable Models for Information Diffusion in Social Networks. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 259–271, 2006.
 Anton J. Kleywegt, Alexander Shapiro, and Tito Homemde Mello. The Sample Average Approximation Method for Stochastic Discrete Optimization. SIAM Journal on Optimization, 12(2):479–502, 2002.
 Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Costeffective Outbreak Detection in Networks. In ACM International Conference on Knowledge Discovery and Data Mining, pages 420–429, 2007.
 Charles X Ling and Chenghui Li. Data Mining for Direct Marketing: Problems and Solutions. In ACM International Conference on Knowledge Discovery and Data Mining, volume 98, pages 73–79, 1998.
 Hao Ma, Haixuan Yang, Michael R. Lyu, and Irwin King. Mining Social Networks using Heat Diffusion Processes for Marketing Candidates Selection. In ACM Conference on Information and Knowledge Management, pages 233–242, 2008.
 Arun S. Maiya and Tanya Y. BergerWolf. Benefits of Bias: Towards Better Characterization of Network Sampling. In ACM International Conference on Knowledge Discovery and Data Mining, pages 105–113, 2011.
 Shodai Mihara, Sho Tsugawa, and Hiroyuki Ohsaki. Influence Maximization Problem for Unknown Social Networks. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 1539–1546, 2015.
 Ramasuri Narayanam and Yadati Narahari. A Shapley Valuebased Approach to Discover Influential Nodes in Social Networks. IEEE Transactions on Automation Science and Engineering, 8(1):130–147, 2011.
 George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An Analysis of Approximations for Maximizing Submodular Set Functions. Mathematical Programming, 14(1):265–294, 1978.
 Praneeth Netrapalli and Sujay Sanghavi. Learning the Graph of Epidemic Cascades. In ACM SIGMETRICS Performance Evaluation Review, volume 40, pages 211–222. ACM, 2012.
 Mark E. J. Newman. Modularity and Community Structure in Networks. National Academy of Sciences, 103(23):8577–8582, 2006.
 Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014.
 Eric Rice. The Positive Role of Social Networks and Social Networking Technology in the Condomusing Behaviors of Homeless Young People. Public Health Reports, 125(4):588–595, 2010.
 Eric Rice, Eve Tulbert, Julie Cederbaum, Anamika Barman Adhikari, and Norweeta G. Milburn. Mobilizing Homeless Youth for HIV Prevention: A Social Network Analysis of the Acceptability of a Facetoface and Online Social Networking Intervention. Health Education Research, 27(2):226–236, 2012.
 Matthew Richardson and Pedro Domingos. Mining Knowledgesharing Sites for Viral Marketing. In ACM International Conference on Knowledge Discovery and Data Mining, pages 61–70, 2002.
 Manuel G. Rodriguez, David Balduzzi, and Bernhard Schölkopf. Uncovering the Temporal Dynamics of Diffusion Networks. In International Conference on Machine Learning, pages 561–568, 2011.
 Kazumi Saito, Ryohei Nakano, and Masahiro Kimura. Prediction of Information Diffusion Probabilities for Independent Cascade Model. In International Conference on KnowledgeBased and Intelligent Information and Engineering Systems, pages 67–75, 2008.
 Lior Seeman and Yaron Singer. Adaptive Seeding in Social Networks. In IEEE Symposium on Foundations of Computer Science, pages 459–468, 2013.
 David Silver and Joel Veness. MonteCarlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems, pages 2164–2172, 2010.
 Fenella Starkey, Suzanne Audrey, Jo Holliday, Laurence Moore, and Rona Campbell. Identifying Influential Young People to Undertake Effective Peerled Health Promotion: The Example of A Stop Smoking In Schools Trial (ASSIST). Health Education Research, 24(6):977–988, 2009.
 Mani R. Subramani and Balaji Rajagopalan. Knowledgesharing and Influence in Online Social Networks via Viral Marketing. Communications of the ACM, 46(12):300–307, 2003.
 Jimeng Sun and Jie Tang. A Survey of Models and Algorithms for Social Influence Analysis. In Social Network Data Analytics, pages 177–214. Springer, 2011.
 Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social influence analysis in largescale networks. In ACM International Conference on Knowledge Discovery and Data Mining, pages 807–816, 2009.
 Youze Tang, Xiaokui Xiao, and Yanchen Shi. Influence Maximization: Nearoptimal Time Complexity Meets Practical Efficiency. In ACM SIGMOD International Conference on Management of Data, pages 75–86, 2014.
 Thomas W. Valente. Network Interventions. Science, 337(6090):49–53, 2012.
 Sharan Vaswani, Laks Lakshmanan, Mark Schmidt, et al. Influence Maximization with Bandits. arXiv:1503.00024, 2015.
 Stanley Wasserman and Katherine Faust. Social Network Analysis: Methods and Applications, volume 8. Cambridge university press, 1994.
 Bryan Wilder, Nicole Immorlica, Eric Rice, and Milind Tambe. Maximizing Influence in an Unknown Social Network. In International Conference on Autonomous Agents and Multiagent Systems, 2018.
 Bryan Wilder, Amulya Yadav, Nicole Immorlica, Eric Rice, and Milind Tambe. Uncharted but not Uninfluenced: Influence Maximization with an Uncertain Network. In International Conference on Autonomous Agents and Multiagent Systems, pages 1305–1313, 2017.
 Miao Xie, Qiusong Yang, Qing Wang, Gao Cong, and Gerard De Melo. DynaDiffuse: A Dynamic Diffusion Model for Continuous Time Constrained Influence Maximization. In AAAI Conference on Artificial Intelligence, pages 346–352, 2015.
 Amulya Yadav, Hau Chan, Albert Xin Jiang, Haifeng Xu, Eric Rice, and Milind Tambe. Using Social Networks to Aid Homeless Shelters: Dynamic Influence Maximization under Uncertainty. In International Conference on Autonomous Agents and Multiagent Systems, pages 740–748, 2016.
 Amulya Yadav, Bryan Wilder, Eric Rice, Robin Petering, Jaih Craddock, Amanda YoshiokaMaxwell, Mary Hemler, Laura OnaschVera, Milind Tambe, and Darlene Woo. Influence Maximization in the Field: The Arduous Journey from Emerging to Deployed Application. In International Conference on Autonomous Agents and Multiagent Systems, pages 150–158, 2017.
 Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In International Conference on Machine Learning, pages 928–936, 2003.