Stochastic Dynamic Programming Heuristics for Influence Maximization-Revenue Optimization
The well-known Influence Maximization (IM) problem has been actively studied by researchers over the past decade, with emphasis on marketing and social networks. Existing research have obtained solutions to the IM problem by obtaining the influence spread and utilizing the property of submodularity. This paper is based on a novel approach to the IM problem geared towards optimizing clicks and consequently revenue within an Online Social Network (OSN). Our approach diverts from existing approaches by adopting a novel, decision-making perspective through implementing Stochastic Dynamic Programming (SDP). Thus, we define a new problem Influence Maximization-Revenue Optimization (IM-RO) and propose SDP as a method in which this problem can be solved. The SDP method has lucrative gains for an advertiser in terms of optimizing clicks and generating revenue however, one drawback to the method is its associated “curse of dimensionality" particularly for problems involving a large state space. Thus, we introduce the Lawrence Degree Heuristic (LDH), Adaptive Hill- Climbing (AHC) and Multistage Particle Swarm Optimization (MPSO) heuristics as methods which are orders of magnitude faster than the SDP method whilst achieving near-optimal results. Through a comparative analysis on various synthetic and real-world networks we present the AHC and LDH as heuristics well suited to to the IM-RO problem in terms of their accuracy, running times and scalability under ideal model parameters. In this paper we present a compelling survey on the SDP method as a practical and lucrative method for spreading information and optimizing revenue within the context of OSNs.
advantages to advertising and
compared to traditional direct
marketing strategies due to its
ease of deployment and ability
to effectively use customers
themselves to encourage product
preferences in others
marketing through online
accounts for a major source of
revenue for many OSNs. For
example, according to ,
advertising continues to propel Facebook’s revenue
generation, accounting for billion, the majority
source of income for Facebook in
2016. OSNs utilize the advantage
of viral marketing because one
considers not only the
effect of marketing to a customer so that the customer purchases a product but also the customer’s
influence in persuading other customers to purchase as well.
The focus of this paper centers around the positioning of an advertiser’s link delivered to a web page, that is to say, the placement of advertisement impressions. Advertising companies have the task of placing impressions on pages to be displayed to its users. Thus, the objective of the problem becomes to place impressions to OSN users in a way that maximizes the value to the advertiser. The problem known as the IM problem was first formally expressed in  as choosing a good initial set of nodes to target in the context of influence models such as, the Independent Cascade, Linear Threshold and generalizations that followed [8, 16]. However, the problem of choosing an ideal set of customers in a network to market to in order to generate the maximum profit to the advertiser was first studied in [11, 40]. Since its formal definition in , the IM problem has been actively studied by researchers over the past decade and is not restricted to applications in marketing only but also in healthcare, communication, education, agriculture, and epidemiology [27, 30, 35, 44]. For the IM problem, OSN users are represented in a graph , where the nodes of represent the users and the edges in represent the relationships between users. In , the problem was first defined as a discrete optimization problem and the term influence of a set of nodes , denoted by was defined to be the expected number of active nodes at the end of a diffusion process, given that was this initial active node set. According to the work done in , the IM problem therefore seeks to determine a parameter , that is, to find a set of maximum influence; where . It is an open question to compute this set and expected number of active nodes by an efficient method, but very good estimates have been proposed and obtained [6, 7, 30].
This paper provides a novel approach to the IM problem and a formal definition to the model proposed in . We divert from all other existing approaches to IM and adopt a novel decision-making perspective primarily used in shortest paths and resource allocation problems [2, 29, 37, 39]. Thus, we define a new problem, the IM-RO problem and implement SDP as the method in which this problem can be solved. The SDP method and its multistage attribute was demonstrated to generate lucrative gains to advertisers; causing over an 80% increase in the expected number of clicks when evaluated on various networks.
Due to the complexity of the SDP method, we propose and analyze the LDH, AHC and MPSO algorithms as heuristics employed to tackle the “curse of dimensionality" associated with implementing SDP. Through conducting experiments on synthetic networks, we demonstrate that all three methods achieved near-optimal solutions and are orders of magnitude faster than the SDP method. We provide a comparative analysis on various synthetic and real-world networks and present the LDH and AHC as promising heuristics in terms of their accuracy, running times and scalability under suitable model parameters. Although the MPSO heuristic generated the highest expected number of clicks when compared to the LDH and AHC heuristics, it is unreliable and its running time is too slow thus making it unfeasible for large graphs, i.e over 500 nodes. Our results reveal the high potential of the LDH and AHC heuristics as an effective advertising strategy in providing near-optimal expected click values in minimal running times.
Researchers have also sought to generate influence models which capture the real-world influence outlined by the IM problem and to determine node and edge probabilities from real-graph data based on past-propagations [16, 17, 41]. Effectively generating these influence models and computing their node and edge probabilities has also been an area widely researched. For the IM-RO problem we implement the Negative Influence model (NIM) and Graph Influence model (GIM) as the influence models which capture node and edge propagations among users within an OSN. At the end of each stage (specified time period) after users are placed with impressions, the node probabilities are updated based on the influence model used. The probability of a user clicking on an impression depends directly on their friend’s behavior (whether their friend have clicked or not on an impression).
The paper is organized as follows. We briefly revisit the introduction of the SDP method to the IM problem in Section (3). We then introduce the proposed heuristics (LDH, AHC and MPSO algorithms) in Section (4). In Section(5) we provide experimental results and a performance analysis for synthetic networks and real-world OSNs. We conclude the paper in Section 6 summarizing the main contributions and directions for future work.
2 Related Work
The problem of
selecting an ideal
set of nodes in a graph or
determining which set
users should be
marketed to in order to obtain the
maximum expected profit from
was first studied in [11, 40]. In these papers,
viewed as trying to
convince a subset of individuals
to purchase a new product or
innovation with the goal of
generating further purchases over
entire network. In other words, the
choosing specific users in a network
created a cascade over the entire
network. Solutions to this problem
comprised of both a
non linear and a
linear probabilistic model that
optimized the revenue
generated from sales.
Subsequently, in 
this optimization problem was
defined as the
the emphasis shifted from maximizing
the profit generated from sales to
maximizing a cascade effect or
the number of activated
the end of a diffusion process in the
context of diffusion
[18, 15, 8, 16].
Thresholds models have been
the context of sociological
theory and social networks in
[18, 32, 21]. However,
the generalization of the
Linear Threshold and Independent
Cascade model proposed
 lies at
the core of most threshold
models for the
IM problem .
Using the linear threshold and independent cascade models, the problem was shown to be NP hard in . Moreover, work done in  showed that using Linear Programming and the greedy algorithm, an approximate solution to the IM problem which was within of the optimal solution could be obtained. Approaches to solving influence maximization problems have been put forward in [36, 13]. Similar to the work done in [11, 40, 1], we focus on the selection of an ideal set of nodes for the purpose of optimizing clicks and revenue to the advertiser. We divert from approaches to the problem that utilize influence spread and the theory of submodular functions as done in several papers [26, 6, 27, 28, 30, 7, 8, 3, 4, 14, 20, 46, 47] and focus on maximizing the expected gains for the advertiser. Our formulation to the problem formally known as the IM problem is novel since in addition to adopting a decision-making perspective its main goal is to maximize the expected number or clicks or revenue to the advertiser. Thus we define a new problem, the IM-RO problem and introduce SDP as the method in which this problem can be solved.
Definition 2.1 (Influence Maximization - Revenue Optimization).
Given a network modeled as graph , a fixed number (impressions to be placed) and the probability of a user clicking on an impressions, the problem seeks to find the optimal users to place impressions so as to maximize the expected probability of clicking and ultimately revenue.
3 SDP for IM-RO
We consider the mulitstage SDP method to the IM, first introduced in  and now defined as the IM-RO problem. The problem entails placing an integer amount, , number of impressions to OSN users over stages, with representing the number of stages to go. The aim is to determine the number of impressions to be placed in each stage or impression-to-stage allocations , and the optimal users that solves equation 1 :
where , is the total number of impressions allocated over stages such that , as a user can only be given an impression once.
and which represent user being not given or given and has not clicked or has clicked on an impression respectively (see  for more details).
We briefly revisit an evaluation of the SDP method through an analysis on two simple networks, Figure (1) and Figure (2). Tables (1, 2 and 3) provide a concise survey of the SDP model on these networks using both the GIM and NIM as the influence models by which probabilities are updated. The GIM is given by equation (2) and the NIM is given by equation (3).
For these models, represents the number of friends who have clicked on an impression, represents the total number of friends of user and represents a user’s initial probability of clicking on an impression at the start of a -stage problem when . This probability is a user’s natural inclination for clicking on an impression in the absence of any influence from friends. and are influence constants where is the negative influence constant associated with , the number of users who have been given impressions and have not clicked on them.
|Influence Model||Stages||Allocation||User||Expected Clicks||Time|
|Influence Model||Stages||Allocation||User||Expected Clicks||Time|
|Influence Model||Stages||Allocation||User||Expected Clicks||Time|
single stage problem involving 6
impressions with , the
number of clicks is calculated as
1.5. The optimal
expected number of clicks
by the SDP method with ,
and under the GIM is 2.69, that is
of approximately 80% percent.
These results have considerable
the advertiser in terms of
spreading information and
revenue. We note
that for the NIM, the optimal expected
number of clicks is achieved at 3 stages
in both networks
whilst for the GIM the optimal expected click
value increases as the number of stages
However, the running times to achieve
this results is unfeasible,
for large networks hence the need for
computationally less extensive
heuristic solutions which
optimal results. For both the
and NIM model a significant
in the optimal expected number
clicks can be achieved at 3
reasonable time. Another
fact, is the drastic
increase in running times caused
by adding a single
impression. When and
, the SDP method achieves
the optimal solution in
approximately 7 minutes on these
For an asymptotic analysis on the SDP method, we consider a 2-stage problem with impressions to be placed to its users. If we consider the impression-to-stage allocation [1, M-1], then there are possible combinations of users to choose from for this. For [2, M-2], there are possible combinations of users to choose from and possible combinations of users to choose from for [3, M-3]. If we continue counting the steps in this manner until the last impression-to- stage allocation , then using the Binomial Theorem, we can prove that that is an upper bound on the number of steps to attain the optimal solution. Hence the SDP method has a complexity of in its worst case. For large graphs, this proves to be intractable. In order to reduce its complexity and evaluate the performance of the SDP method on larger networks, we propose heuristics which leverage on the optimality of the SDP method whilst reducing its complexity.
Below we describe three heuristics, the LDH, AHC and MPSO that adopt the multistage aspect of the SDP method. The MPSO, however is the least reliable in terms of its accuracy compared to the LDH and AHC since its state space comprises of all possible predetermined users in each impression to stage allocation and their associated expected number of clicks. This is an essential characteristic of the SDP method, and LDH and AHC heuristics in attaining the optimal and near-optimal solution.
We begin by introducing the LDH as a method which reduces the complexity of the SDP method by reducing its branching factor. For a given 2 or 3 stage problem, the LDH generates the impression-to-stage allocation [1, M-1] or [1, 1, M-2] respectively. Next, the optimal expected number of clicks is computed for this impression-to-stage allocation using equation and with users . Here is the optimal solution to the IM-RO problem in which , the node of the highest valency in the graph is selected at the first stage when . The inspiration for the LDH is based on the efficiency of well known high degree heuristics in  as well as the experimental findings of Section(3) in which the optimal solution was achieved in 3 stages. As the LDH expands only one node corresponding to either [1, M-1] or [1, 1, M-2], its complexity is , which is a drastic reduction to the complexity of the SDP method.
The hill-climbing search algorithm often referred
to as the
greedy hill-climbing algorithm is an example
of a local search algorithm that operates by
expanding a single node and navigating to
with the goal of finding the global
one exists. The general hill-
climbing algorithm and its variants have been proposed in [45, 10]. Moreover, for the IM problem the greedy
algorithm and improvements of this algorithm have
been proposed in several papers
[22, 27, 30, 28, 6].
We implement an adaptive hill- climbing technique to the IM-RO problem with the functionality of the general hill-climbing algorithm, however the algorithm expands nodes corresponding to the impression-to-stage allocation [1,M-1] or [1, M-2, 1] for a given 2 or 3 stage problem respectively. The first node to expand in the th stage-to-go is chosen randomly. Based on the click outcomes, the probabilities over the entire network are updated using either the NIM or GIM and the expected number of clicks computed as in the SDP method. For the AHC algorithm, each time a node is randomly chosen in the stage to go and the expected number of clicks computed for the allocation using equation (1), its value is compared to the previous value computed. The AHC algorithm continues randomly expanding nodes in the stage- to-go and computing their associated optimal expected number of clicks for a specified number of iterations . In general, the hill-climbing algorithm does not guarantee the optimal solution, however has an memory and is quite efficient. We provide the hill climbing algorithm adapted to the IM-RO problem as follows:
Particle Swarm Optimization (PSO), was first proposed as one of the swarm intelligence algorithms for optimizing continuous nonlinear functions in . PSO is an algorithm that is modeled on the social behavior of swarming observed in insects, fishes and birds . The main idea of PSO originated from the movement of bird flocks, in which the algorithm can find the optimal solution in the search space just like a flock of bird searching for its food. For the original continuous space PSO algorithm proposed in , the particles cooperated with each other in a global optimum and -dimensional search space in order to move to better positions.. The position vector is used to denote the current solution of particle whilst the velocity vector is used to provide the direction of the particle and adjust the particle’s position to the optimal solution. Various researchers have extended the original PSO algorithm proposed in  to discrete optimization problems [9, 46, 42, 43]. The first of this kind was the binary particle swarm (BPSO) proposed in . Similar to the continuous space PSO algorithm, the discrete space PSO algorithm involves the following probability update rules:
particle maintains both a
position and velocity over iterations given by
is the vector representing the personal best
solution of the particle and
global best solution obtained by the entire
swarm. and are parameters which
weigh each particles own experience and the the
entire swarm respectively whilst, , are constants such that, ,
[0,1]. At each iteration, the
particle’s velocity is updated
by using its own
search experience and the
experience of the
entire swarm as it flies to a
For the implementation of the MPSO algorithm, the state space comprises of the set of possible predetermined users in each impression to stage allocation with their corresponding expected number of clicks. We modify and make use of a key concept called a Swap Operator proposed in  to handle discrete type PSO problems. For the implementation of the MPSO algorithm, a solution set can be described as a specific impression to stage allocation in which all of the users are identified. We define a Swap operator as intechanging user i with the user in the -th position, as addding user to the the th position in the stage to go and as removing user to and from the position in the stage to go. Using the these swap operators we can redefine addition on the solution sets with a new solution . That is,
A swap sequence , is a
sequence made up
of one or more of the following
Swap Operators as
defined in equations
We redefine subtraction, on two solutions
and as the Swap Sequence
acting on the solution in order to
obtain solution .
For example, consider a SDP formulation of the IM-RO problem involving 4 impressions and 2 stages, with two solutions and : =[2, 2] with users 1,2 in the first stage and 3,5 in the second stage.
= [1,3] with users 5 in the first stages and 2,3,1 in the second stage.
We can apply the Swap Operator to removing user 2 from the first position to obtain a new solution = [1,2] with user 5 in the first stage and use 3,1 in the second stage. The second Swap Operator can be applied to where user 2 is added to position 2 in order to obtain a new solution = with user 5,2 in the first stage and users 3, 1 in the second stage. The third swap operator is applied to and interchanges the user in position 1 with user 1. Thus = [2,2] with user 1,2 in the first stage and 3,5 in the second stage. Hence, a swap sequence with the least number of operators for is = . In implementing the MPSO, the velocity is updated using equation and applying the relevant swap sequences . We provide an algorithm, Algorithm for the procedure as follows:
We evaluated the effectiveness of the proposed heuristics using synthetic and real-world OSNs.
We employed various synthetic
networks and two
real-world OSNs represented
as graphs to analyze each
method. Synthetic networks of
various sizes 10, 50,100, 500,
1000, 2000, 4000, 4500 and 5200
using a pseudo random
number generator as done in
. From a sample
of 10 generated synthetic graphs, the
average node degree was found to
be at least 60 % of the number
of nodes in the
In addition to these networks, we utilized two real- world OSNs Flickr and Epinions obtained from the Social Computing Data Repository in  and the Stanford Network Analysis Platform in  respectively. The OSN, Flickr is is an image hosting and video sharing website where users can share images among each other. In this network "1,2" is used to represent the friendship relationship between the user id 1 and the user id 2. The entire dataset consists of 80,513 nodes, from this we extracted two datasets, FL1 comprising of 11,098 nodes and FL2, comprising of 20,217 nodes each with an average node degree of 2 nodes for the purpose of evaluating each heuristics.
Epinions is a customer review OSN in which users rate various products that are purchased on Ebay. The entire dataset consists of 75,879 nodes, from which, we extracted a dataset of 4,382 nodes with an average node degree of 3 nodes and refer to this dataset as Ep.
5.2 Experimental Settings
Influence models for the IM problem
described as models which
capture real-world propagations
or the spread of information
among users within a
network. In addition to the
diffusion models; the Linear
Independent Cascade models
defined in ,
influence models that determine
node and edge
probabilities have been
[11, 40, 13, 16, 4, 5].
For the IM-RO problem we
introduce the GIM equation
(2) and NIM
influence models by which probabilities are updated at the end of each stage. The SDP method
for the problem adopts a
multistage approach and at each
users are provided
with advertising links or
impressions. At the end of each
stage, the outcomes or whether a
user has clicked or not are
determined and this information
is utilized in the influence
models to update the
probabilities for future
stages. The objective thus becomes to
determine the number of
impressions to be at placed at
each stage and the users to
place impressions to, so as to
maximize the number of
purchases. A user clicking on an
impression is equated to a user
purchasing a product, therefore
optimizing the revenue generated
is identical to optimizing the
expected number of clicks.
A user’s initial probability of
arbitrarily set to be
for these experiments,
and were also
set to be 0.25.
However for future work, we will
can be effectively estimated
using data mining techniques.
All our experimentation was undertaken on a server with 8GB of RAM and i3 Processor. The SDP method, LDH, AHC and MPSO heuristics were implemented from scratch using a Python version 2.7 (64 bit) with an average of 10 runs taken for each experiment.
5.3 Performance Analysis on Synthetic Networks
|Influence Model||Graph Size||Iteration (n)||Optimal Clicks||Time (secs)|
|Graph Size||Swarm Size||Iteration||Optimal Clicks||Time (secs)|
The results indicated in Table
(5) convey the
optimal expected number of clicks and
running times of the LDH under the
GIM. As shown in Table
(5), the LDH is orders of
magnitude faster than the SDP
method achieving a “good"
solution of 1.45 in less than an hour
on a synthetic graph of 5200 nodes. For now, we can think of
solution as a solution that is at
least as high as the
value obtained by placing all
the impressions in one stage,
however, for future work we will
obtain an upper
bound on the optimal solution, as
this will provide greater insights
into reasonable solutions and
how well these heuristics perform on
large graphs. We note that the
number of clicks determined by the
SDP method on a graph of 10 nodes
was found to be 1.91 under identical
model parameters of this experiment. We further note
increase in the value of
, even when
is assigned small values , results in
increase in the expected number
of clicks. Hence we propose the
LDH method as a reasonable and
promising method which leverages on the accuracy of the
SDP method whilst reducing its
To evaluate the performance of the AHC algorithm, we varied the graph sizes and number of iterations. The results in Table (4) indicate that the optimal expected number of clicks increases with the number of iterations, particularly for large values of , that is, as seen in the graphs of 50 and 500 nodes. The AHC algorithm generates a value of 1.250 for a graph of 2000 nodes in less than 50 iterations and 1.252 for a graph for a graph of 500 nodes in 100 iterations, both values in less than an hour. Under the GIM, the AHC generates higher values as high as 1.821 for a graph of 50 nodes in 50 iterations. For a network of size 2000 nodes, the AHC generates 1.71 clicks in approximately one hour. However for a graph of 4,500 nodes notably under NIM, the AHC proves to be unfeasible taking 5 hours to generate 1.250 clicks in 10 iterations. Hence we consider the AHC method as a moderately efficient method for obtaining near optimal solutions to the IM-RO problem. Taking into consideration (1) increasing the number of iterations increases the optimal expected number of clicks and running times and (2) utilizing ideal influence model parameters can generate higher optimal expected click values.
Table provides and analysis for the MPSO method on the IM-RO problem with both and set to 0.5. In particular, we note the effect of increasing the swarm size, and the number of iterations on the optimal expected number of clicks. For a network of 50 users with , and less than 10 iterations, the MPSO method generates “good" results in minutes under the NIM. However, for larger graphs, (i.e greater than 500), the MPSO converges slowly taking hours to converge to less accurate solutions. This is primarily due to the fact that its running times increases significantly with its swarm size and number of iterations. From the analysis in Table(6), we can conclude that the MPSO method is a fairly reasonable algorithm in terms of achieving near-optimal solutions, however its running time is too slow making it unfeasible for large graphs. Moreover, it is unreliable in terms of accuracy since its state space consists of a set of optimal expected click values for predetermined users in impression-to-stage allocations.
We observe Figure (3) and Figure and note the effect of varying on all three methods. The results indicate that the LDH and AHC generate identical optimal expected number of click values on various synthetic networks. We consider Figure when and highlight the significant increase in the optimal expected number of clicks from 1.0 to 1.5. These results have considerable gains for any OSN advertiser and significant implications for the choice of influence models and the effect of optimizing influence model parameters in maximizing the expected number of clicks. Another reason for the similarity in performance of the LDH and AHC algorithms can be attributed to the similarity in the synthetic networks each being generated by the same random number generator. We note that the MPSO algorithm generates the highest expected number of clicks for all graph sizes however its running time is too slow for large graphs, this result is further supported in our scalability analysis.
To evaluate the scalability, the sizes of synthetic networks were doubled from 250, 500, 1000,…, up to 4000 nodes.
Figures (6) and (6) demonstrate the results of the running times of the LDH, AHC and MPSO methods on a regular scale and log-log scale respectively. From the result in Fig (6), we can clearly deduce that the PSO algorithm is not scalable since its running times is in the hour range for 2000 nodes making it unfeasible to run on larger graphs. We also consider the high degree of the graphs generated by the psuedo random number generator allowing them to be suitable indicators for relatively any dataset. Figure (6) provides a further differentiation between the algorithms. From this results we conclude that all three algorithms have similar slopes, however the LDH and AHC has both a good slope and intercept making them suitable for large graphs with at least thousands of nodes and edges.
5.5 Performance on real- world OSNs
We compare the computational time and optimal expected number of clicks generated by the LDH and AHC heuristics on two real-world OSNs under the GIM with model parameters , and 10 iterations. Table and Figure indicate that the performance of the LDH is considerably better or at least as good as the AHC heuristic in terms of the optimal number of clicks generated on the Epinions dataset whilst the AHC generates significantly higher optimal expected click values on the Flickr dataset. We attribute these results to the design of the LDH being more suited to the structure of the OSN Epinions and less to the structure of Flickr. Indeed, while both the LDH and the AHC heuristic achieve near optimal solutions in a run time of under 30 minutes even for a network of 20,21 users, the LDH attains the optimal values in seconds for all three networks. In general, the AHC is well suited to both Epinions and Flickr OSNs in terms of its accuracy and running times. For a problem involving 5 impressions, the optimal expected number of clicks generated is at least 2. Although the LDH generates similar results for a problem involving 5 impressions on the Epinions dataset, the optimal expected number of clicks generated from the Flickr dataset is 1 even when there is an increase the number of stages.
|Method||OSN||Impressions||Stages||Optimal Clicks||Time (secs)|
The LDH and AHC heuristic exhibit good performance and are orders of magnitude faster than the SDP method. The results for these heuristics suggest that advertising companies can target the optimal users to market (or spread information) to in OSNs in a way that can generate predictable and lucrative gains for socio-economic advancement.
We provide a novel
approach to influence
maximization which until now has been
primarily used in resource
allocation and shortest path
problems. We divert
from previous approaches to
influence maximization based on
the theory of submodular
functions and adopt a novel and
decision-making approach geared
maximizing clicks an revenue
among users of an OSN.
Hence we redefine the
problem as IM-RO and introduce
SDP as the method in which this
problem can be solved. We first
reviewed the properties of the
SDP method on small synthetic
networks and highlight the lucrative
advantages that our method poses
to advertising companies in terms
of generating revenue and
optimizing clicks. Due to the
complexity of the SDP method, we
sought to obtain heuristics which
achieved near optimal solutions
in considerably less
We second, proposed three heuristics, the LDH, AHC and MPSO algorithms which exploited the multistage attribute of the SDP method whilst reducing its complexity. In addition to achieving near-optimal solutions, all three methods were found to be orders of magnitude faster than the SDP method. We provided a scalability analysis and evaluated our proposed heuristics on synthetic networks of various sizes and two real-world OSNs, Flickr and Epinions. The LDH and AHC are shown to be well-suited heuristics for the SDP method in terms of their accuracy, scalability and running times. The AHC is a more efficient heuristic than the LDH since it outperforms the LDH in terms of accuracy and running times for the two real-world OSNs.
We confirmed that the GIM exceeded the NIM in generating optimal expected number of click with approximately the same computational times. It was shown that increasing within both influence models significantly increased the optimal expected number of clicks even when remained small eg. . This result provides substantial implications for the potential gains in obtaining ideal influence models and optimizing their associated model parameters.
Our immediate future work is to provide an extensive analysis on our influence models and how their parameters affect the IM-RO problem. It is also necessary to obtain accurate estimates of the influence model parameters through statistical and data mining techniques in order to improve on the optimality of the expected number of clicks.
As a immediate consequence of approaching the IM problem through a decision-making perspective there are multiple directions for future work, both in terms of optimization (approximate dynamic programming methods) and data science. The results presented provide an evaluation for our methods on large networks. It is also necessary to derive an upper bound on the objective function in order to determine how well our methods perform on these large networks. Another direction for future work related to influence maximization is to obtain the influence spread for the IM-RO problem where the influence spread is defined as a function on the number of stages of the problem. Our future work also includes exploring this applications in fields of healthcare, communication, epidemiology, education, and agriculture.
-  Abbassi, Z., Bhaskara, A., Misra, V.: Optimizing display advertising in online social networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee.1-11 (2015).
-  Bertsekas, D. P., & Tsitsiklis, J. N., "An Analysis of Stochastic Shortest Path Problems", Mathematics of Operations Research. 16, 580-595 (1991)
-  Bhagat, S., Goyal, A., Lakshmanan, L.: Maximizing product adoption in social networks. In Proceedings of the 5th ACM International Conference on Web search and Data Mining. ACM. 603-612 (2012).
-  Cao T, Wu X., Hu T.X., Wang S.: Active learning of model parameters for influence maximization. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 280- 295 (2011).
-  Chakrabarti, P., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM. 307-318 (1998)
-  Chen, W., Wang, W. Y., Yang, S.: Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.199-208 (2009).
-  Chen, W., Wang, C., Wang, Y. : Scalable influence maximization for prevalent viral marketing in large scale social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 1029–1038 (2010)
-  Chen, W., Collins, A., Cummings, R.: Influence maximization in social networks when negative opinions may emerge and propagate. SIAM SDM, 11: 379-390 (2011)
-  Clerc, M.: Discrete particle Swarm Optimization, in New Optimization Techniques in Engineering. New York, Springer-Verlag (2004).
-  Davis, L.D.: Bit-climbing, representational bias, and test suite design. In R. K. Belew and L. B. Booker (eds.), Proceedings of the Fourth International Conference on Genetic Algorithms, pp 18-23. CA: Morgan Kaufmann, San Mateo (1991)
-  Domingos, P. & Richardson, M. : Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 57-66 (2001)
-  Eberhart, R.C. , Kennedy,J. A new optimizer using particle swarm theory.In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1, 39–43 (1995)
-  Galhotra S., Arora A., Shourya R.: Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models. In Proceedings of the 2016 International Conference on Management of Data. SIGMOD. 743- 758 (2016)
-  Galstyan, A., Musoyan, V., Cohen, P.: Maximizing influence propagation in networks with community structure. Phys. Rev. E. 79(5), (2009)
-  Gomez-Rodriguez M., Leskovec J., Krause A. Inferring networks of diffusion and influence. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 1019-1028. (2010)
-  Goyal, A., Bonchi, M., & Lakshmanan, L.: Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining. 241-250. (2010)
-  Goyal, A., Bonchi, F., Lakshmanan, L.: A data-base approach to social influence maximization. In Proceedings of the 38th international conference of the VLDM Endowment. ACM. 73-84. (2011)
-  Granovetter, M. Threshold models of collective behavior. The American Journal of Sociology (6), 1420–1443. (1978)
-  Hosein, P., Lawrence, T.: Stochastic dynamic model for revenue optimization in social networks. In Proceedings of the 11th International Conference On Wireless and Mobile Computing, Networking and Communications.IEEE. 378-383 (2015)
-  Hosseini-Pozveh, M., Zamanifar, K., Naghsh-Nilchi, A., Dolog, P., Maximizing the spread of positive influence in signed social networks. Intelligent Data Analysis. 20(1) 199-218 (2006)
-  Jackson, M. & Yariv, L.: Diffusion on social networks. Economie Publique. 16 69-82. (2005).
-  Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 137-146 (2003).
-  Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In Proceedings of IEEE International Conference on Computational Cybernetics and Simulation. IEEE. 5, 4104-4108 (1997).
-  Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, (2004)
-  Kimura, M., Saito, K. Motod. H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. (2008).
-  Kimura, M., Saito, K.: Approximate solutions for the influence maximization problem in a social network. Knowledge-Based Intelligent Information and Engineering Systems. LNCS. 4252: 937-44 (2006)
-  Kimura, M., Saito, K., Motod, H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. (2008).
-  Liu, B., Cong, G., Xu, D.: Time constrained influence maximization in social networks. In Proceedings of the 12th IEEE International Conference on Data Mining. IEEE. 439-48. (2012).
-  Levi, R., Roundy, R., Shmoys, D.B. Provably near-optimal sampling-based policies for stochastic inventory control models. Math. Oper. Res. 32 821-839. (2007)
-  Leskovec, J., Krause, K. ,Geustrin, C.: Cost effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 420-429. (2007).
Leskovec, J. &
Krevl, A.: SNAP Datasets:Stanford
Large Network Dataset Collection.
-  Macy, M. & Willer, R.: From Factors to Actors: Computational Sociology and Agent-Based Modeling. Ann. Rev. Soc. (2002)
-  Matsumoto, M. & Nishimura, T. “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”. ACM . Transactions on Modeling and Computer Simulation. 8. 3-30 (1998)
-  Meerman, D.S. Viral Marketing: Let the world tell your story for free [online], Pragmatic Marketing, Available from: <http://www.pragmatic marketing.com/publications/magazine/5/5/viral-marketing-let-the- worldtell-your-story-for-free>,(2008)Accessed 10 Oct 2017
-  Morone, F. & Makse, H.: Influence maximization in complex networks through optimal percolation.Nature. 524, 65-68 (2015)
-  Narayanam, R., Narahari, Y.:Determining the top-k nodes in social networks using the shapely value. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM. 1509-1512. (2008).
-  Nascimento, J. & Powell, W. :An Optimal Approximate Dynamic Programming Algorithm for the Economic Dispatch Problem with Grid-Level Storage, IEEE Transactions on Automatic Control (2013)
-  Nemhauser, G.L & Wolsey, L.A. : An Analysis Of Approximations For Maximizing Submodular Set Functions-I. Mathematical Programming. 14 265-294 (1978).
-  Powell, W. B. : Exploration Versus Exploitation, in Approximate Dynamic Programming: Solving the Curses of Dimensionality, Second Edition, John Wiley & Sons, Inc., Hoboken, NJ, USA. (2011) doi: 10.1002/9781118029176.ch12
-  Richardson, M. & Domingos, R. : Mining knowledge sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 61-70 (2002)
-  Saito, K., Nakano, R., Kimura, M. : Prediction of information diffusion probabilities for independent cascade model. Knowledge- Based Intelligent and Engineering Systems. 67-75 (2008).
-  Salman, A., Ahmad, I., Al-Mahadi, S.: Particle Swarm Optimization for task assignment problems. Microprocessor microsystems. 26, 363-371 (2002)
-  Sha, D.Y. & Hsu, C. : A hybrid particle swarm optimization for job scheduling problem. Computers and Industrial Engineering. 51, 791–808 (2006)
-  Singer, Y. : How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for 470 Social Networks. Fifth ACM Int Conf Web Search Data Min. 1-10 (2012)
-  Tsang, E. & C. Voudouris, C.: Fast local search and guided local search and their application to british telecom’s workforce scheduling. Technical Report CSM-246, Department of Computer Science, University of Essex, Colchester, UK, (1995)
-  Wang, Y., Cong, G., Song, G.: Community-based greedy algorithm for mining to k influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. DOI: 10.1145/1835804. 1835935. (2010)
-  Wu, Hao-Hsiang & Kucukyavuz, S.: Maximizing Influence in Social Networks: A Two-Stage Stochastic Programming Approach That Exploits Submodularity. Department of Integrated Systems Engineering, The Ohio State University, Columbus, OH. (2016).
-  Zafarani, R. & Liu, H. Social Computing Data Repository at ASU (2009) [http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State University, School of Computing, Informatics and Decision Systems Engineering
-  Facebook Reports Fourth Quarter and Full Year 2016 Results. https://s21.q4cdn.com/399680738/files/doc_financials/2016/Q4/Facebook-Reports-Fourth-Quarter-and-Full-Year-2016-Results.pdf(2016). Accessed 3 January 2017