Uncertainty about models and data is ubiquitous in the computational social sciences, and it creates a need for robust social network algorithms, which can simultaneously provide guarantees across a spectrum of models and parameter settings. We begin an investigation into this broad domain by studying robust algorithms for the Influence Maximization problem, in which the goal is to identify a set of nodes in a social network whose joint influence on the network is maximized.
We define a Robust Influence Maximization framework wherein an algorithm is presented with a set of influence functions, typically derived from different influence models or different parameter settings for the same model. The different parameter settings could be derived from observed cascades on different topics, under different conditions, or at different times. The algorithm’s goal is to identify a set of nodes who are simultaneously influential for all influence functions, compared to the (functionspecific) optimum solutions.
We show strong approximation hardness results for this problem unless the algorithm gets to select at least a logarithmic factor more seeds than the optimum solution. However, when enough extra seeds may be selected, we show that techniques of Krause et al. can be used to approximate the optimum robust influence to within a factor of . We evaluate this bicriteria approximation algorithm against natural heuristics on several realworld data sets. Our experiments indicate that the worstcase hardness does not necessarily translate into bad performance on realworld data sets; all algorithms perform fairly well.
1 Introduction
Computational social science is the study of social and economic phenomena based on electronic data, algorithmic approaches and computational models. It has emerged as an important application of data mining and learning, while also invigorating research in the social sciences. Computational social science is frequently envisioned as a foundation for a discipline one could term “computational social engineering,” wherein algorithmic approaches are used to change or mitigate individuals’ behavior.
Among the many concrete problems that have been studied in this context, perhaps the most popular is Influence Maximization. It is based on the observation that behavioral change in individuals is frequently effected by influence from their social contacts. Thus, by identifying a small set of “seed nodes,” one may influence a large fraction of the social network. The desired behavior may be of social value, such as refraining from smoking or drug use, using superior crops, or following hygienic practices. Alternatively, the behavior may provide financial value, as in the case of viral marketing, where a company wants to rely on wordofmouth recommendations to increase the sale of its products.
1.1 Prevalence of Uncertainty and Noise
Contrary to the “hard” sciences, the study of social networks — whether using traditional or computational approaches — suffers from massive amounts of noise inherent in the data and models. The reasons range from the fundamental to the practical:

At a fundamental level, it is not even clear what a “social tie” is. Different individuals or researchers operationalize the intuition behind “friendship”, “acquaintance”, “regular” advice seeking, etc. in different ways (see, e.g., [4]). Based on different definitions, the same realworld individuals and behavior may give rise to different mathematical models of the same “social network.”

Mathematical models of processes on social networks (such as opinion adoption or tie formation) are at best approximations of reality, and frequently mere guesses or mathematically convenient inventions. Furthermore, the models are rarely validated against realworld data, in large part due to some of the following concerns.

Human behavior is typically influenced by many environmental variables, many of them hard or impossible to measure. Even with the rapid growth of available social data, it is unlikely that data sets will become sufficiently rich to disentangle the dependence of human behavior on the myriad variables that may shape it.

Observational data on social behavior is virtually always incomplete. For example, even if API restrictions and privacy were not concerns (which they definitely are at this time) and a “complete” data set of Twitter and Facebook and email communication were collected, it would still lack inperson and phone interactions.

Inferring model parameters relies on a choice of model and hyperparameters, many of which are difficult to make. Furthermore, while for many models, parameter inference is computationally efficient, this is not universally the case.
Since none of these issues are likely to be resolved anytime soon, both the models for social network processes and their inferred parameters must be treated with caution. This is true both when one wants to draw scientific insight for its own sake, and when one wants to use the inferred models to make computational social engineering decisions. Indeed, the correctness guarantees for algorithms are predicated on the assumption of correctness of the model and the inferred parameters. When this assumption fails — which is inevitable — the utility of the algorithms’ output is compromised. Thus, to make good on the claims of realworld relevance of computational social science, it is imperative that the research community focus on robustness as a primary design goal.
1.2 Modeling Uncertainty in Influence Maximization
We take an early step in this bigger agenda, studying robustness in the context of the wellknown Influence Maximization problem. (Detailed definitions are given in Section 3.) In Influence Maximization, the algorithm selects a set of seed nodes, of prespecified size . The seed nodes are initially exposed to a product or idea; we say that they are active. Based on a probabilistic model of influence propagation^{1}^{1}1We use the terms “influence propagation” and “diffusion” interchangeably., they cause some of their neighbors to become active, who then cause some of their neighbors to become active, etc.; this process leads to a (random) final set of active nodes. The goal is to maximize the size of this set; we denote this quantity by .
The concerns discussed above combine to lead to significant uncertainty about the function : different models give rise to very different functional forms of , and missing observations or approximations in inference lead to uncertainty about the models’ parameters.
To model this uncertainty, we assume that the algorithm is presented with a set of influence functions, and assured that one of these functions actually describes the influence process, but not told which one. The set could be finite or infinite. A finite could result from a finite set of different information diffusion models that are being considered, or from of a finite number of different contexts under which the individuals were observed (e.g., wordofmouth cascades for different topics or products), or from a finite number of different inference algorithms or algorithm settings being used to infer the model parameters from observations. An infinite (even continuous) arises if each model parameter is only known to lie within some given interval; this model of adversarial noise, which we call the Perturbation Interval model, was recently proposed in [24].
Since the algorithm does not know , in the Robust Influence Maximization problem, it must “simultaneously optimize” for all objective functions in , in the sense of maximizing , where is an optimal solution knowing which function is to be optimized. In other words, the selected set should simultaneously get as close as possible to the optimal solutions for all possible objective functions.
1.3 Our Approach and Results
Our work is guided by the following overarching questions:

How well can the objective be optimized in principle?

How well do simple heuristics perform in theory?

How well do simple heuristics perform in practice?

How do robustly and nonrobustly optimized solutions differ qualitatively?
We address these questions as follows. First, we show (in Section 4) that unless the algorithm gets to exceed the number of seeds by at least a factor , approximating the objective to within a factor is NPhard for all .
However, when the algorithm does get to exceed the seed set target by a factor of (times a constant), much better bicriteria approximation guarantees can be obtained.^{2}^{2}2 A bicriteria algorithm gets to pick more nodes than the optimal solution, but is only judged against the optimum solution with the original bound on the number of nodes. Specifically, we show that a modification of an algorithm of Krause et al. [27] uses seeds and finds a seed set whose influence is within a factor of optimal.
We also investigate two straightforward heuristics:

Run a greedy algorithm to optimize directly, picking one node at a time.

For each objective function , find a set (approximately) maximizing . Evaluate each of these sets under , and keep the best one.
We first exhibit instances on which both of the heuristics perform very poorly. Next (in Section 5), we focus on more realistic instances , exemplifying the types of scenarios under which robust optimization becomes necessary. In the first set of experiments, we infer influence networks on a fixed node set from Twitter cascades on different topics. Individuals’ influence can vary significantly based on the topic, and for a previously unseen topic, it is not clear which inferred influence network to use. In additional sets of experiments, we derive data sets from the same MemeTracker data [29], but use different time slices, different inference algorithms and parametrizations, and different samples from confidence intervals.
The main outcome of the experiments is that while the algorithm with robustness as a design goal typically (though not even always) outperforms the heuristics, the margin is often quite small. Hence, heuristics may be viable in practice, when the influence functions are reasonably similar. A visual inspection of the nodes chosen by different algorithms reveals how the robust algorithm “hedges its bets” across models, while the nonrobust heuristic tends to cluster selected nodes in one part of the network.
1.4 Stochastic vs. Adversarial Models
Given its prominent role in our model, the decision to treat the choice of as adversarial rather than stochastic deserves some discussion.
First, adversarial guarantees are stronger than stochastic guarantees, and will lead to more robust solutions in practice. Perhaps more importantly, inferring a Bayesian prior over influence functions in will run into exactly the type of problem we are trying to address in the first place: data are sparse and noisy, and if we infer an incorrect prior, it may lead to very suboptimal results. Doing so would next require us to establish robustness over the values of the hyperparameters of the Bayesian prior over functions.
Specifically for the Perturbation Interval model, one may be tempted to treat the parameters as drawn according to some distribution over their possible range. This approach was essentially taken in [2, 21]. Adiga et al. [2] assume that for each edge independently, its presence/absence was misobserved with probability , whereas Goyal et al. [21] assume that for each edge, the actual parameter is perturbed with independent noise drawn uniformly from a known interval. In both cases, under the Independent Cascade model (for example), the edge activation probability can be replaced with the expected edge activation probability under the random noise model, which will provably lead to the exact same influence function . Thus, independent noise for edge parameters, drawn from a known distribution, does not augment the model in the sense of capturing robustness. In particular, it does not capture uncertainty in a meaningful way.
To model the type of issues one would expect to arise in realworld settings, at the very least, noise must be correlated between edges. For instance, certain subpopulations may be inherently harder to observe or have sparser data to learn from. However, correlated random noise would result in a more complex description of the noise model, and thus make it harder to actually learn and verify the noise model. In particular, as discussed above, this would apply given that the noise model itself must be learned from noisy data.
2 Related Work
Based on the early work of Domingos and Richardson [13, 38], Kempe et al. [25] formally defined the problem of finding a set of influential individuals as a discrete optimization problem, proposing a greedy algorithm with a approximation guarantee for the Independent Cascade [16, 17] and Linear Threshold [23] models. A long sequence of subsequent work focused on more efficient algorithms for Influence Maximization (both with and without approximation guarantees) and on broadening the class of models for which guarantees can be obtained [3, 7, 9, 25, 26, 33, 40, 41]. See the recent book by Chen et al. [5] and the survey in [25] for more detailed overviews.
As a precursor to maximizing influence, one needs to infer the influence function from observed data. The most common approach is to estimate the parameters of a particular diffusion model [1, 11, 18, 19, 35, 37, 39]. Theoretical bounds on the required sample complexity for many diffusion models have been established, including [1, 35, 37] for the DiscreteTime Independent Cascade (DIC) model, [11] for the ContinuousTime Independent Cascade (CIC) model, and [35] for the Linear Threshold model. However, it remains difficult to decide which diffusion models fit the observation best. Moreover, the diffusion models only serve as a rough approximation to the realworld diffusion process. In order to sidestep the issue of diffusion models, Du et al. [14] recently proposed to directly learn the influence function from the observations, without assuming any particular diffusion model. They only assume that the influence function is a weighted average of coverage functions. While their approach provides polynomial sample complexity, they require a strong technical condition on finding an accurate approximation to the reachability distribution. Hence, their work remains orthogonal to the issue of Robust Influence Maximization.
Several recent papers take first steps toward Influence Maximization under uncertainty. Goyal, Bonchi and Lakshmanan [21] and Adiga et al. [2] study random (rather than adversarial) noise models, in which either the edge activation probabilities are perturbed with random noise [21], or the presence/absence of edges is flipped with a known probability [2]. Neither of the models truly extends the underlying diffusion models, as the uncertainty can simply be absorbed into the probabilistic activation process.
Another approach to dealing with uncertainty is to carry out multiple influence campaigns, and to use the observations to obtain better estimates of the model parameters. Chen et al. [8] model the problem as a combinatorial multiarmed bandit problem and use the UCB1 algorithm with regret bounds. Lei et al. [28] instead incorporate beta distribution priors over the activation probabilities into the DIC model. They propose several strategies to update the posterior distributions and give heuristics for seed selection in each trial so as to balance exploration and exploitation. Our approach is complementary: even in an explorationbased setting, there will always be residual uncertainty, in particular when exploration budgets are limited.
The adversarial Perturbation Interval model was recently proposed in work of the authors [24]. The focus in that work was not on robust optimization, but on algorithms for detecting whether an instance was likely to suffer from high instability of the optimal solution. Optimization for multiple scenarios was also recently used in work by Chen et al. on tracking influential nodes as the structure of the graph evolves over time [10]. However, the model explicitly allowed updating the seed set over time, while our goal is simultaneous optimization.
Simultaneously to the present work, Chen et al. [6] and Lowalekar et al. [32] have been studying the Robust Influence Maximization problem under the Perturbation Interval model [24]. Their exact formulations are somewhat different. The main result of Chen et al. [6] is an analysis of the heuristic of choosing the best solution among three candidates: make each edge’s parameter as small as possible, as large as possible, or equal to the middle of its interval. They prove solutiondependent approximation guarantees for this heuristic.
The objective of Lowalekar et al. [32] is to minimize the maximum regret instead of maximizing the minimum ratio. They propose a heuristic based on constraint generation ideas to solve the robust influence maximization problem. The heuristic does not come with approximation guarantees; instead, [32] proposes a solutiondependent measure of robustness of a given seed set. As part of their work, [32] prove a result similar to our Lemma 1, showing that the worstcase instances all have the largest or smallest possible values for all parameters.
3 Models and Problem Definition
3.1 Influence Diffusion Models
For concreteness, we focus on two diffusion models: the discretetime Independent Cascade model (DIC) [25] and the continuoustime Independent Cascade model (CIC) [19]. Our framework applies to most other diffusion models; in particular, most of the concrete results carry over to the discrete and continuous Linear Threshold models [25, 39].
Under the DIC model, the diffusion process unfolds in discrete time steps as follows: when a node becomes active in step , it attempts to activate all currently inactive neighbors in step . For each neighbor , it succeeds with a known probability ; the are the parameters of the model. If node succeeds, becomes active. Once has made all its attempts, it does not get to make further activation attempts at later times; of course, the node may well be activated at time or later by some node other than .
The CIC model describes a continuoustime process. Associated with each edge is a delay distribution with parameter . When a node becomes newly active at time , for every neighbor that is still inactive, a delay time is drawn from the delay distribution. is the duration it takes to activate , which could be infinite (if does not succeed in activating ). Commonly assumed delay distributions include the Exponential distribution or Rayleigh distribution. If multiple nodes attempt to activate , then is activated at the earliest time . Nodes are considered activated by the process if they are activated within a specified observation window .
A specific instance is described by the class of its influence model (such as DIC, CIC, or others not discussed here in detail) and the setting of the model’s parameters; in the DIC and CIC models above, the parameters would be the influence probabilities and the parameters of the edge delay distributions, respectively. Together, they completely specify the dynamic process; and thus a mapping from initially active sets to the expected number^{3}^{3}3The model and virtually all results in the literature extend straightforwardly when the individual nodes are assigned nonnegative importance scores. of nodes active at the end of the process. We can now formalize the Influence Maximization problem as follows:
Definition 1 (Influence Maximization)
Maximize the objective subject to the constraint .
For most of the diffusion models studied in the literature, including the DIC [25] and CIC [15] models, it has been shown that is a monotone and submodular^{4}^{4}4Recall that a set function is monotone iff whenever , and is submodular iff whenever . function of . These properties imply that a greedy approximation algorithm guarantees a approximation [36].
3.2 Robust Influence Maximization
The main motivation for our work is that often, is not precisely known to the algorithm trying to maximize influence. There may be a (possibly infinite) number of candidate functions , resulting from different diffusion models or parameter settings. We denote the set of all candidate influence functions^{5}^{5}5For computation purposes, we assume that the functions are represented compactly, for instance, by the name of the diffusion model and all of its parameters. by . We now formally define the Robust Influence Maximization problem.
Definition 2 (Robust Influence Maximization)
Given a set of influence functions, maximize the objective
subject to a cardinality constraint . Here is a seed set with maximizing .
A solution to the Robust Influence Maximization problem achieves a large fraction of the maximum possible influence (compared to the optimal seed set) under all diffusion settings simultaneously. Alternatively, the solution can be interpreted as solving the Influence Maximization problem when the function is chosen from by an adversary.
While Definition 2 per se does not require the to be submodular and monotone, these properties are necessary to obtain positive results. Hence, we will assume here that all are monotone and submodular, as they are for standard diffusion models. Notice that even then, is the minimum of submodular functions, and as such not necessarily submodular itself [27].
A particularly natural and important special case of Definition 2 is the Perturbation Interval model recently proposed in [24]. Here, the influence model is known (for concreteness, DIC), but there is uncertainty about its parameters. For each edge , we have an interval , and the algorithm only knows that the parameter (say, ) lies in ; the exact value is chosen by an adversary. Notice that is (uncountably) infinite under this model. While this may seem worrisome, the following lemma shows that we only need to consider finitely (though exponentially) many functions:
Lemma 1
Under the Perturbation Interval model for DIC^{6}^{6}6The result carries over with a nearly identical proof to the Linear Threshold model. We currently do not know if it also extends to the CIC model., the worst case for the ratio in for any seed set is achieved by making each equal to or .

Fix one edge , and consider an assignment (fixed for now) of activation probabilities to all edges . Let denote the (variable) activation probability for edge . First, fix any seed set , and define to be the expected number of nodes activated by when the activation probabilities of all edges are and the activation probability of is .
We express using the triggering set [25, Section 4.1] approach. Let be the set of all possible directed graphs on the given node set . For any graph , let be the number of nodes reachable from in via a directed path, and let be the probability that graph is obtained when each edge is present in independently with probability (or , if ). By the triggering set technique [25, Proof of Theorem 4.5], we get that
The probabilities for obtaining a graph are:
In either case, we obtain a linear function of , so that , being a sum of linear functions, is also linear in .
Therefore, the function , being a maximum of linear functions of , is convex and piecewise linear. Consider any fixed seed set , and the ratio . Its level set is equal to . Because , a convex function minus a linear function, is convex, its 0level set is convex. Hence, all level sets of are convex, and is quasiconcave.
Because is quasiconcave, it is unimodal, and thus minimized at one of the endpoints of the interval. Hence, we can minimize the ratio — and thus the performance of the seed set — by making either as small or as large as possible. By repeating this argument for all edges one by one, we arrive at an influence setting minimizing the performance of , and in which all influence probabilities are equal to the left or right endpoint of the respective interval .
4 Algorithms and Hardness
Even when contains just a single function , Robust Influence Maximization is exactly the traditional Influence Maximization problem, and is thus NPhard. This issue also appears in a more subtle way: evaluating (for a given ) involves taking the minimum of over all . It is not clear how to calculate the ratio even for one of the , since the scaling constant (which is independent of the chosen ) is exactly the solution to the original Influence Maximization problem, and thus NPhard to compute.
This problem, however, is fairly easy to overcome: instead of using the true optimum solutions for the scaling constants, we can compute approximations using the greedy algorithm, because the are monotone and submodular [36]. Then, because for all , we obtain that the “greedy objective function”
satisfies the following property for all sets :
(1) 
Hence, optimizing in place of comes at a cost of only a factor in the approximation guarantee. We will therefore focus on solving the problem of (approximately) optimizing .
Because each is monotone and submodular, and the , just like the , are just scaling constants, is a minimum of monotone submodular functions. However, we show (in Theorem 2, proved in Appendix A) that even in the context of Influence Maximization, this minimum is impossible to approximate to within any polynomial factor. This holds even in a bicriteria sense, i.e., the algorithm’s solution is allowed to pick nodes, but is compared only to solutions using nodes. The result also extends to the seemingly more restricted Perturbation Interval model, giving an almost equally strong bicriteria approximation hardness result there.
Theorem 2
Let be any constants, and assume that . There are no polynomialtime algorithms for the following problems:

Given nodes and a set of influence functions on these nodes (derived from the DIC or CIC models), as well as a target size . Find a set of nodes, such that , where is the optimum solution of size .

Given a graph on nodes and intervals for edge activation probabilities under the DIC model (or intervals for edge delay parameters under the CIC model), as well as a target size . Find a set of cardinality (for a sufficiently small fixed constant ) such that , where is the optimum solution of size .
The hardness results naturally apply to any diffusion model that subsumes the DIC or CIC models. However, an extension to the DLT model is not immediate: the construction relies crucially on having many edges of probability 1 into a single node, which is not allowed under the DLT model.
4.1 Bicriteria Approximation Algorithm
Theorem 2 implies that to obtain any nontrivial approximation guarantee, one needs to allow the algorithm to exceed the seed set size by at least a factor of . In this section, we therefore focus on such bicriteria approximation results, by slightly modifying an algorithm of Krause et al. [27].
The slight difference lies in how the submodular coverage subproblem is solved. Both [27] and the Greedy Mintss algorithm [22] greedily add elements. However, the Greedy Mintss algorithm adds elements until the desired submodular objective is attained up to an additive term, while [27] requires exact coverage. Moreover, directly considering realvalued submodular functions instead of going through fractional values leads to a more direct analysis of the Greedy Mintss algorithm [22].
The highlevel idea of the algorithm is as follows. Fix a real value , and define and . Then, if and only if for all . But because by definition, for all , the latter is equivalent to . (If any term in the sum is less than , no other term can ever compensate for it, because they are capped at .)
Because is a nonnegative linear combination of the monotone submodular functions , it is itself a monotone and submodular function. This enables the use of a greedy approximation algorithm to find an (approximately) smallest set with . If has size at most , this constitutes a satisfactory solution, and we move on to larger values of . If has size more than , then the greedy algorithm’s approximation guarantee ensures that there is no satisfactory set of size at most . Hence, we move on to smaller values of . For efficiency, the search for the right value of is done with binary search and a specified precision parameter.
A slight subtlety in the greedy algorithm is that could take on fractional values. Thus, instead of trying to meet the bound precisely, we aim for a value of . Then, the analysis of the Greedy Mintss algorithm of Goyal et al. [22] (of which our algorithm is an unweighted special case) applies. The resulting algorithm Saturate Greedy is given as Algorithm 1. The simple greedy subroutine — a special case of the Greedy Mintss algorithm — is given as Algorithm 2.
By combining the discussion at the beginning of this section (about optimizing vs. ) with the analysis of Krause et al. [27] and Goyal et al. [22], we obtain the following approximation guarantee.
Theorem 3
Let . Saturate Greedy finds a seed set of size with
where is an optimal robust seed set of size .

Algorithm 1 uses Algorithm 2 (Greedy Mintss) as a subroutine to find^{7}^{7}7Technically, the guarantees on Greedy Mintss depend on being able to evaluate precisely [22, Theorem 1]. However, Theorem 2 of [22] states that by obtaining approximations to , we can ensure that , where as . For influence coverage functions, arbitrarily close approximations to can be obtained by Monte Carlo simulations. We therefore ignore the issue of sampling accuracy in this article, and perform the analysis as though could be evaluated precisely. Otherwise, the approximations carry through in a straightforward way, leading to multiplicative factors . a solution such that and , where is a smallest solution guaranteeing .
In light of the general outline and motivation for the Saturate Greedy algorithm given above, it mostly remains to verify how the guarantees for Greedy Mintss and the balancing of the parameters carry through.
We will show that throughout the algorithm (or more precisely: the binary search), always remains a lower bound on the solution for the problem with the relaxed cardinality constraint, while remains an upper bound on the solution for the original problem. In other words, there is no set of cardinality at most with , and there is a set of cardinality at most with .
To show this claim, consider the set returned by the Greedy Mintss algorithm. If , the guarantee for Greedy Mintss implies that , where is the optimal solution for the instance. Because , the value is not feasible, and the algorithm is correct in setting to .
Otherwise, , and the guarantee of Greedy Mintss implies that . Because each by definition, we get for all ,
and therefore . This confirms the correctness of assigning .
Since we do not set , we need to briefly verify termination of the binary search. For any iteration in which we update , let . When the new is set to , we get that . Hence, the size of the interval keeps decreasing geometrically, and the binary search terminates in iterations.
At the time of termination, we obtain that . Combining this bound with the factor of we lost due to approximating with , we obtain the claim of the theorem.
Theorem 3 holds very broadly, so long as all influence functions are monotone and submodular. This includes the DIC, DLT, and CIC models, and allows mixing influence functions from different model classes.
4.2 Simple Heuristics
In addition to the Saturate Greedy algorithm, our experiments use two natural baselines. The first is a simple greedy algorithm Single Greedy which adds elements to one by one, always choosing the one maximizing . While this heuristic has provable guarantees when the objective function is submodular, this is not the case for the minimum of submodular functions.
The second heuristic is to run a greedy algorithm for each objective function separately, and choose the best of the resulting solutions. Those solutions are exactly the sets defined earlier in this section. Thus, the algorithm consists of choosing . We call the resulting algorithm All Greedy.
In the worst case, both Single Greedy and All Greedy can perform arbitrarily badly, as seen by the following class of examples with a given parameter . The example consists of instances of the DIC model for the following graph with nodes (where ). The graph comprises a directed complete bipartite graph with nodes on one side and nodes on the other side, as well as separate edges . The edges have activation probability 1 in all instances. In the bipartite graph, in the scenario, only the edges leaving node have probability 1, while all others have 0 activation probability.
The optimal solution for Robust Influence Maximization is to select all nodes , since one of them will succeed in activating the nodes . The resulting objective value will be close to 1. However, All Greedy only picks one node and the remaining nodes as . Single Greedy instead picks all of the . Thus, both All Greedy and Single Greedy will have robust influence close to as grows large. Empirical experiments confirm this analysis. For example, for and , Saturate Greedy achieves , while Single Greedy and All Greedy only achieve and , respectively.
Implementation
The most timeconsuming step in all of the algorithms is the estimation of influence coverage, given a seed set . Naïve estimation by Monte Carlo simulation could lead to a very inefficient implementation. The problem is even more pronounced compared to traditional Influence Maximization as we must estimate the influence in multiple diffusion settings. Instead, we use the ConTinEst algorithm of Du et al. [15] for fast influence estimation under the CIC model. For the DIC model, we generalize the approach of Du et al. To accelerate the Greedy Mintss algorithm, we also apply the CELF optimization [31] in all cases. Analytically, one can derive linear running time (in both and ) for all three algorithms, thanks to the fast influence estimation. This is borne out by detailed experiments in Section 5.4.
5 Experiments
We empirically evaluate the Saturate Greedy algorithm and the Single Greedy and All Greedy heuristics. Our goal is twofold: (1) Evaluate how well Saturate Greedy and the heuristics perform on realistic instances. (2) Qualitatively understand the difference between robustly and nonrobustly optimized solutions.
Our experiments are all performed on realworld data sets. The exception is the scalability experiments in Section 5.4, which benefit from the controlled environment of synthetic networks. The data sets span the range of different causes for uncertainty, namely: (1) influences are learned from cascades for different topics; (2) influences are learned with different modeling assumptions; (3) influences are only inferred to lie within intervals (the Perturbation Interval model).
5.1 Different Networks
We first focus on the case in which the diffusion model is kept constant: we use the DIC model, with parameters specified below. Different objective functions are obtained from observing cascades (1) on different topics. We use Twitter retweet networks for different topics. (2) at different times. We use MemeTracker diffusion network snapshots at different times.
The Twitter networks are extracted from a complete collection of tweets between Jan. 2010 and Feb. 2010. We treat each hashtag as a separate cascade, and extract the top 100/250 users with the most tweets containing these hashtags into two datasets (Twitter100 and Twitter250). The hashtags are manually grouped into five categories of about 70–80 hashtags each, corresponding to major events/topics during the data collection period. The five groups are: Haiti earthquake (Haiti), Iran election (Iran), Technology, US politics, and the Copenhagen climate change summit (Climate). Examples of hashtags in each group are shown in Table 1. Whenever user retweets a post of user with a hashtag belonging to category , we insert an edge with activation probability 1 from to in graph . The union of all these edges specifies the influence function.
Our decision to treat each hashtag as a separate cascade is supposed to capture that most hashtags “spread” across Twitter when one user sees another use it, and starts posting with it himself. The grouping of similar hashtags captures that a user who may influence another to use the hashtag, say, #teaparty, would likely also influence the other user to a similar extent to use, say, #liberty. The pruning of the data sets was necessary because most users had showed very limited activity. Naturally, if our goal were to evaluate the algorithmic efficiency rather than the performance with respect to the objective function, we would focus on larger networks, even if the networks were less easily visualized.
Category  Hashtags 

Iran  #iranelection, #iran, #16azar, #tehran 
Haiti  #haiti, #haitiquake, #supphaiti, #cchaiti 
Technology  #iphone, #mac, #microsoft, #tech 
US politics  #obama, #conservative, #teaparty, #liberty 
Climate  #copenhagen, #cop15, #climatechange 
The MemeTracker dataset [29] contains memes extracted from the Blogsphere and mainstream media sites between Aug. 2009 and Feb. 2010. In our experiments, we extract the 2000/5000 sites with the most posting activity across the time period we study (Meme2000 and Meme5000). We extract six separate diffusion networks, one for each month. The network for month contains all the directed links that were posted in month (in reverse order, i.e., if links to , then we add a link from to ), with activation probability 1. It thus defines the influence function.
The parameters of the DIC model used for this set of experiments are summarized in Table 2.
Data set  Edge Activation Probability  # Seeds 

Twitter100  0.2  10 
Twitter250  0.1  20 
Meme2000  0.05  50 
Meme5000  0.05  100 
Recalling that in the worst case, a relaxation in the number of seeds is required to obtain robust seed sets, we allow all algorithms to select more seeds than the solution they are compared against. Specifically, we report results in which the algorithms may select , and seeds, respectively. The reported results are averaged over three independent runs of each of the algorithms.
Results: Performance
The aggregate performance of the different algorithms on the four data sets is shown in Figure 1.
The first main insight is that (in the instances we study) getting to overselect seeds by 50%, all three algorithms achieve a robust influence of at least 1.0. In other words, 50% more seeds let the algorithms perform as though they knew exactly which of the (adversarially chosen) diffusion settings was the true one. This suggests that the networks in our data sets share a lot of similarities that make influential nodes in one network also (mostly) influential in the other networks. This interpretation is consistent with the observation that the baseline heuristics perform similarly to (and in one case better than) the Saturate Greedy algorithm. Notice, however, that when selecting just seeds, Saturate Greedy does perform best (though only by a small margin) among the three algorithms. This suggests that keeping robustness in mind may be more crucial when the algorithm does not get to compensate with a larger number of seeds.
Results: Visualization
To further illustrate the tradeoffs between robust and nonrobust optimization, we visualize the seeds selected by Saturate Greedy (robust seeds) compared to seeds selected nonrobustly based on only one diffusion setting. For legibility, we focus only on the Twitter250 data set, and only plot out of the networks. (The fifth network is very sparse, and thus not particularly interesting.)
Figure 2 compares the seeds selected by Saturate Greedy with those (approximately) maximizing the influence for the Iran network. Notice that Saturate Greedy focuses mostly (though not exclusively) on the densely connected core of the network (at the center), while the Iranspecific optimization also exploits the dense regions on the left and at the bottom. These regions are much less densely connected in the US politics and Climate networks, while the core remains fairly densely connected, leading the Saturate Greedy solution to be somewhat more robust.
Similarly, Figure 3 compares the Saturate Greedy seeds (which are the same as in Figure 2) with seeds for the Climate network. The trend here is exactly the opposite. The seeds selected based only on the Climate network are exclusively in the core, because the other parts of the Climate network are barely connected. On the other hand, the robust solution picks a few seeds from the clusters at the bottom, left, and right, which are present in other networks. These seeds lead to extra influence in those networks, and thus more robustness.
5.2 Different Diffusion Models
In choosing a diffusion model, there is little convincing empirical work guiding the choice of a model class (such as CIC, DIC, or threshold models) or of distributional assumptions for model parameters (such as edge delay). A possible solution is to optimize robustly with respect to these different possible choices.
In this section, we evaluate such an approach. Specifically, we perform two experiments: (1) learning the CIC influence network under different parametric assumptions about the delay distribution, and (2) learning the influence network under different models of influence (CIC, DIC, DLT). We again use the MemeTracker dataset, restricting ourselves to the data from August 2008 and the 500 most active users. We use the MultiTree algorithm of GomezRodriguez et al. [20] to infer the diffusion network from the observed cascades. This algorithm requires a parametric assumption for the edge delay distribution. We infer ten different networks corresponding to the Exponential distribution with parameters 0.05, 0.1, 0.2, 0.5, 1.0, and to the Rayleigh distribution with parameters 0.5, 1, 2, 3, 4. The length of the observation window is set to 1.0.
We then use the three algorithms to perform robust influence maximization for seeds, again allowing the algorithms to exceed the target number of vertices. The influence model for each graph is the CIC model with the same parameters that were used to infer the graphs.
The performance of the algorithms is shown in Figure 4(a). All methods achieve satisfactory results in the experiment; this is again due to high similarity between the different diffusion settings inferred with different parameters.
For the second experiment, we investigate the robustness across different classes of diffusion models. We construct three instances of the DIC, DLT and DIC model from the ground truth diffusion network between the 500 active users. For the DIC model, we set the activation probability uniformly to . For the DLT model, we follow [25] and set the edge weights to where is the indegree of node . For the CIC model, we use an exponential distribution with parameter and an observation window of length . We perform robust influence maximization for seeds and again allow the algorithms to exceed the target number of seeds.
The results are shown in Figure 4(b). Similarly to the case of different estimated parameters, all methods achieve satisfactory results in the experiment due to the high similarity between the diffusion models. Our results raise the intriguing question of which types of networks would be prone to significant differences in algorithmic performance based on which model is used for network estimation.
5.3 Networks sampled from the Perturbation Interval model
To investigate the performance when model parameters can only be placed inside “confidence intervals” (i.e., the Perturbation Interval model), we carry out experiments under two networks, MemeTracker and STOCFOCS.
The MemeTracker network is extracted from the MemeTracker data set using the ConNIe algorithm [34] to infer the (fractional) parameters (activation probabilities) of a DIC model from the same 500node MemeTracker data set used in the previous section. We also ran experiments on a multigraph extracted from coauthorship of published papers in the conferences STOC and FOCS from 1964–2001. Each node in that network is a researcher with at least one publication in one of the conferences. For each multiauthor paper, we add a complete undirected graph among the authors. We compress parallel edges into a single edge with weight and set the activation probability . If , we truncate its value to . Following the approach of [24], for both networks, we assign “confidence intervals” , where the are the inferred activation probabilities. For experiments on the MemeTracker network, we set , while we use a coarse grid for the experiments on the large graph STOCFOCS with .
While Lemma 1 guarantees that the worstcase instances have activation probabilities or , this still leaves candidate functions, too many to include. We generate an instance for our experiments by sampling 10 of these functions uniformly, i.e., by independently making each edge’s activation probability either or . This collection is augmented by two more instances: one where all edge probabilities are , and one where all probabilities are . Notice that with the inclusion of these two instances, the All Greedy heuristic generalizes the LUGreedy algorithm by Chen et al. [6], but might provide strictly better solutions on the selected instances because it explicitly considers those additional instances. The algorithms get to select 20 seed nodes; note that in these experiments, we are not considering a bicriteria approximation.
The results are shown in Figures 5(a) and 5(b). Contrary to the previous results, when there is a lot of uncertainty about the edge parameters (relative interval size 100% in both networks), the Saturate Greedy algorithm more clearly outperforms the Single Greedy and All Greedy heuristics. Thus, robust optimization does appear to become necessary when there is a lot of uncertainty about the model’s parameters.
Notice that the evaluation of the algorithms’ seed sets is performed only with respect to the sampled influence functions, not with respect to all functions. Whether one can efficiently identify a worstcase parameter setting for a given seed set is an intriguing open question. Absent this ability, we cannot efficiently guarantee that the solutions are actually good with respect to all parameter settings.
5.4 Scalability
To evaluate the scalability of the algorithms, we depart from realworld data sets in order to obtain a controlled environment. We generate networks using the Kronecker graph model [30] with either random, coreperipheral or hierarchicalcommunity structures. For each type, we generate a set of networks of sizes . We use the DIC model with activation probability set to , and select nodes. The running times of the three algorithms are shown in Figure 6 and Figure 7. In Figure 6, we fix the number of networks to five and vary the size of each network; in Figure 7, we fix the size of the networks to and vary the number of networks. The graphs show that the heuristics are faster than the Saturate Greedy algorithm by about a factor of ten, but all three algorithms scale linearly both in the size of the graph and the number of networks, due to the fast influence estimation method.
6 Future Work
Our work marks an early step, rather than the conclusion, in devising robust algorithms for social network tasks, and more specifically Influence Maximization. An interesting unresolved question is whether one can efficiently find an (approximately) worstcase influence function in the Perturbation Interval model. This would allow us to empirically evaluate the performance of natural heuristics for the Perturbation Interval model, such as randomly sampling a small number of influence functions. Furthermore, it would allow us to design “column generation” style algorithms for the Perturbation Interval model, where we alternate between finding a nearoptimal seed set for all influence functions encountered so far, and finding a worstcase influence function for the current seed set, which will then be added to the encountered functions.
In the context of the bigger agenda, one could conceive of other notions of robustness in Influence Maximization, perhaps tracing a finer line between worstcase and Bayesian models. Also, much more research is needed into identifying which influence models best capture the behavior of realworld cascades, and under what circumstances. It is quite likely that different models will perform differently depending on the type of cascade and many other factors, and indepth evaluations of the models could give practitioners more guidance on which mathematical models to choose. While our model of robustness allows us to combine instances of different models (e.g., IC and LT), this may come at a cost of decreased performance for each of the models individually. Thus, it remains an important task to identify the influence models that best fit realworld data.
Acknowledgments
We would like to thank Shaddin Dughmi for useful pointers and feedback, and Shishir Bharathi and Mahyar Salek for useful discussions, and anonymous reviewers for useful feedback. The research was sponsored in part by NSF research grant IIS1254206 and by the U.S. Defense Advanced Research Projects Agency (DARPA) under Social Media in Strategic Communication (SMISC) program, Agreement Number W911NF1210034. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agency, or the U.S. Government.
References
 [1] Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, and Alessandro Panconesi. Trace complexity of network inference. In Proc. 19th Intl. Conf. on Knowledge Discovery and Data Mining, pages 491–499, 2013.
 [2] Abhijin Adiga, Chris J. Kuhlman, Henning S. Mortveit, and Anil Kumar S. Vullikanti. Sensitivity of diffusion dynamics to network uncertainty. In Proc. 28th AAAI Conf. on Artificial Intelligence, 2013.
 [3] Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier. Maximizing social influence in nearly optimal time. In Proc. 25th ACMSIAM Symp. on Discrete Algorithms, pages 946–957, 2014.
 [4] Karen E. Campbell and Barrett A. Lee. Name generators in surveys of personal networks. Social Networks, 13(3):203–221, 1991.
 [5] Wei Chen, Laks V.S. Lakshmanan, and Carlos Castillo. Information and Influence Propagation in Social Networks. Synthesis Lectures on Data Management. Morgan & Claypool, 2013.
 [6] Wei Chen, Tian Lin, Zihan Tan, Mingfei Zhao, and Xuren Zhou. Robust influence maximization. In Proc. 22nd Intl. Conf. on Knowledge Discovery and Data Mining, 2016.
 [7] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proc. 15th Intl. Conf. on Knowledge Discovery and Data Mining, pages 199–208, 2009.
 [8] Wei Chen, Yajun Wang, Yang Yuan, and Qinshi Wang. Combinatorial multiarmed bandit and its extension to probabilistically triggered arms. J. Mach. Learn. Res., 17(1):1746–1778, 2016.
 [9] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In Proc. 10th Intl. Conf. on Data Mining, pages 88–97, 2010.
 [10] Xiaodong Chen, Guojie Song, Xinran He, and Kunqing Xie. On influential nodes tracking in dynamic social networks. In Proc. 15th SIAM Intl. Conf. on Data Mining, pages 613–621, 2015.
 [11] Hadi Daneshmand, Manuel GomezRodriguez, Le Song, and Bernhard Schölkopf. Estimating diffusion network structures: Recovery conditions, sample complexity & softthresholding algorithm. In Proc. 31st Intl. Conf. on Machine Learning, 2014.
 [12] Irit Dinur and David Steurer. Analytical approach to parallel repetition. In Proc. 45th ACM Symp. on Theory of Computing, pages 624–633, 2014.
 [13] Pedro Domingos and Matthew Richardson. Mining the network value of customers. In Proc. 7th Intl. Conf. on Knowledge Discovery and Data Mining, pages 57–66, 2001.
 [14] Nan Du, Yingyu Liang, MariaFlorina Balcan, and Le Song. Influence function learning in information diffusion networks. In Proc. 31st Intl. Conf. on Machine Learning, 2014.
 [15] Nan Du, Le Song, Manuel GomezRodriguez, and Hongyuan Zha. Scalable influence estimation in continuoustime diffusion networks. In Proc. 25th Advances in Neural Information Processing Systems, 2013.
 [16] Jacob Goldenberg, Barak Libai, and Eitan Muller. Talk of the network: A complex systems look at the underlying process of wordofmouth. Marketing Letters, 12:211–223, 2001.
 [17] Jacob Goldenberg, Barak Libai, and Eitan Muller. Using complex systems analysis to advance marketing theory development: Modeling heterogeneity effects on new product growth through stochastic cellular automata. Academy of Marketing Science Review, 2001.
 [18] Manuel GomezRodriguez, David Balduzzi, and Bernhard Schölkopf. Uncovering the temporal dynamics of diffusion networks. In Proc. 28th Intl. Conf. on Machine Learning, pages 561–568, 2011.
 [19] Manuel GomezRodriguez, Jure Leskovec, and Andrease Krause. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(4):21, 2012.
 [20] Manuel GomezRodriguez and Bernhard Schölkopf. Submodular inference of diffusion networks from multiple trees. In Proc. 29th Intl. Conf. on Machine Learning, 2012.
 [21] Amit Goyal, Francesco Bonchi, and Laks V. S. Lakshmanan. A databased approach to social influence maximization. Proc. VLDB Endowment, 5(1):73–84, 2011.
 [22] Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan, and Suresh Venkatasubramanian. On minimizing budget and time in influence propagation over social networks. Social Network Analysis and Mining, 3(2):179–192, 2013.
 [23] Mark Granovetter. Threshold models of collective behavior. American Journal of Sociology, 83:1420–1443, 1978.
 [24] Xinran He and David Kempe. Stability of influence maximization. Unpublished Manuscript, available at http://arxiv.org/abs/1501.04579, 2015.
 [25] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence in a social network. Theory of Computing, 11(4):105–147, 2015. A preliminary version of the results appeared in KDD 2003 and ICALP 2005.
 [26] Sanjeev Khanna and Brendan Lucier. Influence maximization in undirected networks. In Proc. 25th ACMSIAM Symp. on Discrete Algorithms, pages 1482–1496, 2014.
 [27] Andreas Krause, H. Brendan McMahan, Carlos Guestrin, and Anupam Gupta. Robust submodular observation selection. In Journal of Machine Learning Research, volume 9, pages 2761–2801, 2008.
 [28] Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, and Pierre Senellart. Online influence maximization. In Proc. 21st Intl. Conf. on Knowledge Discovery and Data Mining, pages 645–654, 2015.
 [29] Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Memetracking and the dynamics of the news cycle. In Proc. 15th Intl. Conf. on Knowledge Discovery and Data Mining, pages 497–506, 2009. Note: updated data sets at http://www.memetracker.org/.
 [30] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11:985–1042, 2010.
 [31] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie S. Glance. Costeffective outbreak detection in networks. In Proc. 13th Intl. Conf. on Knowledge Discovery and Data Mining, pages 420–429, 2007.
 [32] Meghna Lowalekar, Pradeep Varakantham, and Akshat Kumar. Robust influence maximization (extended abstract). In Proc. 15th Intl. Conf. on Autonomous Agents and Multiagent Systems, pages 1395–1396, 2016.
 [33] Elchanan Mossel and Sebastien Roch. Submodularity of influence in social networks: From local to global. SIAM Journal on Computing, 39(6):2176–2188, 2010.
 [34] Seth A. Myers and Jure Leskovec. On the convexity of latent social network inference. In Proc. 22nd Advances in Neural Information Processing Systems, pages 1741–1749, 2010.
 [35] Harikrishna Narasimhan, David C. Parkes, and Yaron Singer. Learnability of influence in networks. In Proc. 27th Advances in Neural Information Processing Systems, pages 3168–3176, 2015.
 [36] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265–294, 1978.
 [37] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In ACM SIGMETRICS Performance Evaluation Review, pages 211–222, 2012.
 [38] Matthew Richardson and Pedro Domingos. Mining knowledgesharing sites for viral marketing. In Proc. 8th Intl. Conf. on Knowledge Discovery and Data Mining, pages 61–70, 2002.
 [39] Kazumi Saito, Masahiro Kimura, Kouzou Ohara, and Hiroshi Motoda. Selecting information diffusion models over social networks for behavioral analysis. In Proc. 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, ECML/PKDD 10, pages 180–195, 2010.
 [40] Chi Wang, Wei Chen, and Yajun Wang. Scalable influence maximization for independent cascade model in largescale social networks. Data Mining and Knowledge Discovery Journal, 25(3):545–576, 2012.
 [41] Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Communitybased greedy algorithm for mining top influential nodes in mobile social networks. In Proc. 16th Intl. Conf. on Knowledge Discovery and Data Mining, pages 1039–1048, 2010.
Appendix A Proof of Theorem 2
We prove the two parts of the theorem by (slightly different) reductions from the gap version of Set Cover. A Set Cover instance consists of a universe , a collection of subsets of , and an integer . A set cover is a collection such that . Without loss of generality, we assume that each element is contained in at least one set — otherwise, there trivially is no set cover. Also, without loss of generality, we assume that , as otherwise, one can trivially pick all sets or one designated set per element.
The gap version of Set Cover then asks us to decide whether there is a set cover of size or whether each set cover has size at least . (The algorithm is promised that the minimum size will never lie between these two values.) Dinur and Steurer [12, Corollary 1.5] showed that the gap version of Set Cover is NPhard.
Part 1
Based on the Set Cover instance, we construct the following instance of Robust Influence Maximization under the DIC model. Let . The instance consists of bipartite graphs on a shared vertex set . contains one node for each set ; contains nodes for each element . Hence, the number of nodes in the constructed graph is ; in particular, it is polynomial, and the reduction takes polynomial time.
In the influence function, all nodes with have a directed edge with activation probability 1 (or exponential delay distribution with delay parameter 1) to all of the (for all ); no other edges are present. Hence, , and . For the CIC model, the time window has size .
First, consider the case when there is a set cover of size . Choose the corresponding as seed nodes, and call the resulting seed set . Because is a set cover, in the instance, all of the are activated, for a total of at least nodes. (Under the CIC model, all of these are activated with high probability, not deterministically, within the steps) Because none of the nodes in and none of the have incoming edges in the instance, the optimum solution for that instance can activate at most all of the nodes and its selected nodes, for a total of . Thus, the objective function value will be 1 (or arbitrarily close to 1 w.h.p. for the CIC model).
Now assume that there is no set cover of size , and consider any seed set . Let be the number of nodes from selected as seeds. Because the set cannot be a set cover by assumption, there must be some . Therefore, under the the influence function, none of the can be ever activated, except those selected directly in . Hence, the number of nodes activated under the influence function is at most . On the other hand, by selecting just one node corresponding to any set , one could have activated all of the (with high probability under the CIC model), for a total of . Thus, the objective function value is at most , where we crudely bounded both and by .
Hence, a bicriteria approximation algorithm could distinguish the two cases, and thus solve the gap version of Set Cover.
Part 2
For the second part, we just consider the gap version with a fixed , say, . Then, in the hard instances, and are polynomially related, which we assume here, i.e., for some constant which is independent of or .
Based on the Set Cover instance, we construct a different Robust Influence Maximization instance, consisting of a directed graph with three layers . The first layer again contains one node for each set ; the second layer now contains just one node for each element . There is an edge (with known influence probability 1, or exponential delay distribution with parameter 1) from to if and only if . The third layer contains nodes. For each and , there is a directed edge with complete uncertainty about its parameter: under the DIC model, the probability is in the interval , and under the CIC model, the edge delay is exponentially distributed with parameter in the interval . In total, the graph has nodes (in particular, polynomially many), and the reduction takes polynomial time. Because is at most polynomially smaller than , we have , and thus . For the CIC model, we set the time horizon to .
First, consider the case when there is a set cover of size . Consider choosing the corresponding as seed nodes; call the resulting seed set . will definitely activate all nodes in , for a total of . Now, consider any assignment of probabilities or edge delays to the edges from to , and an optimal seed set of size . Let be the set of seed nodes chosen from , of size . Then, definitely activates all of , and at most all nodes from as well as nodes from , for a total (so far) of . For any node , the probability that it is activated by is at least as large as under , because for any values of the individual activation probabilities or delays between and , the fact that activates all of ensures that any node in activated under is also activated under (by time , in the case of the CIC model). Because, the expected number of nodes activated from is at least as large under as under , and the ratio is . Since this holds for all settings of the activation probabilities or edge delay parameters, we get that .
Now assume that there is no set cover of size , and consider any seed set . If contained any node , we could replace it with any node such that and activate at least as many nodes as before, so assume without loss of generality that . Because , the gap guarantee implies that there is at least one node that is never activated by . Now consider the probability assignment for all , and for all . (Under the CIC model, set for all , and for all .) Then, the seed set cannot activate any nodes in (except those it may have selected), and will activate a total of at most nodes. (Under the CIC model, this statement holds with high probability.) On the other hand, the seed set (just a single node) would have activated all of (with high probability, under the CIC model), for a total of nodes. Hence, the ratio is at most , implying that .
If there were an bicriteria approximation algorithm for a sufficiently small constant , it could distinguish which of the two cases () applied, thus solving the gap version of Set Cover.