Stochastic Dynamic Programming Heuristics for Influence MaximizationRevenue Optimization
Abstract
The wellknown Influence Maximization (IM) problem has been actively studied by researchers over the past decade, with emphasis on marketing and social networks. Existing research have obtained solutions to the IM problem by obtaining the influence spread and utilizing the property of submodularity. This paper is based on a novel approach to the IM problem geared towards optimizing clicks and consequently revenue within an Online Social Network (OSN). Our approach diverts from existing approaches by adopting a novel, decisionmaking perspective through implementing Stochastic Dynamic Programming (SDP). Thus, we define a new problem Influence MaximizationRevenue Optimization (IMRO) and propose SDP as a method in which this problem can be solved. The SDP method has lucrative gains for an advertiser in terms of optimizing clicks and generating revenue however, one drawback to the method is its associated “curse of dimensionality" particularly for problems involving a large state space. Thus, we introduce the Lawrence Degree Heuristic (LDH), Adaptive Hill Climbing (AHC) and Multistage Particle Swarm Optimization (MPSO) heuristics as methods which are orders of magnitude faster than the SDP method whilst achieving nearoptimal results. Through a comparative analysis on various synthetic and realworld networks we present the AHC and LDH as heuristics well suited to to the IMRO problem in terms of their accuracy, running times and scalability under ideal model parameters. In this paper we present a compelling survey on the SDP method as a practical and lucrative method for spreading information and optimizing revenue within the context of OSNs.
1 Introduction
Viral marketing
possess lucrative
advantages to advertising and
marketing companies
compared to traditional direct
marketing strategies due to its
ease of deployment and ability
to effectively use customers
themselves to encourage product
preferences in others
[34]. Viral
marketing through online
advertising
accounts for a major source of
revenue for many OSNs. For
example, according to [49],
advertising continues to propel Facebook’s revenue
generation, accounting for billion, the majority
source of income for Facebook in
2016. OSNs utilize the advantage
of viral marketing because one
considers not only the
effect of marketing to a customer so that the customer purchases a product but also the customer’s
influence in persuading other customers to purchase as well.
The focus of this paper
centers around the positioning of an advertiser’s link delivered to a web page, that is to say, the
placement of advertisement
impressions. Advertising companies
have the task of placing
impressions on pages to be
displayed
to its users. Thus, the
objective of the problem becomes to
place impressions to OSN users in a
way that maximizes the value to the
advertiser. The problem known as
the IM
problem was first formally
expressed in [22] as
choosing a good initial set of
nodes to target
in the context of influence models such as, the Independent
Cascade, Linear Threshold and generalizations that followed [8, 16].
However, the problem of choosing an ideal
set of customers in a network to market
to in order to generate the maximum
profit to the advertiser was first
studied in [11, 40]. Since its
formal definition in [22], the
IM problem has been actively
studied by researchers over the past
decade and is not restricted to
applications in marketing only but
also
in healthcare, communication,
education,
agriculture, and
epidemiology
[27, 30, 35, 44]. For the
IM problem, OSN users are
represented
in a graph ,
where the nodes
of represent the users and the
edges
in represent the
relationships between
users. In [22], the
problem was first defined as a
discrete optimization
problem and the term
influence of a set of
nodes , denoted by
was defined to be the
expected
number of active nodes at the end of a diffusion process,
given that was this initial active node set.
According to the work done in [22], the IM problem therefore
seeks to determine a parameter , that is, to find a set of maximum
influence; where . It is an open question to
compute this set and expected number of active nodes
by an efficient method, but
very good
estimates have been
proposed and obtained
[6, 7, 30].
This paper provides a novel
approach to the IM problem and a
formal definition
to the model proposed in
[19]. We divert from
all
other existing approaches to IM
and adopt a
novel decisionmaking perspective
primarily used
in shortest paths and resource
allocation
problems
[2, 29, 37, 39]. Thus, we
define a new problem, the IMRO
problem
and implement SDP as the method
in which this problem
can be solved. The
SDP
method and its multistage
attribute was
demonstrated to generate
lucrative gains to advertisers;
causing over
an 80% increase in the
expected number of
clicks
when evaluated on various networks.
Due to the complexity of the SDP method, we propose and analyze the
LDH, AHC and MPSO algorithms as
heuristics
employed to tackle the “curse of
dimensionality" associated with
implementing SDP. Through conducting
experiments on synthetic networks, we
demonstrate that all three methods
achieved
nearoptimal solutions and are orders of
magnitude faster than
the SDP method. We provide a
comparative
analysis on various synthetic and realworld networks and present
the LDH and AHC as promising
heuristics in terms of
their accuracy, running times and
scalability under
suitable
model parameters.
Although the MPSO heuristic
generated
the highest expected number of
clicks when compared to the LDH
and
AHC heuristics,
it is unreliable and its
running time is too slow thus
making it
unfeasible for large graphs, i.e
over
500 nodes.
Our results reveal the high
potential
of the LDH and AHC heuristics as
an effective advertising strategy
in providing nearoptimal expected
click values in minimal running
times.
Researchers have also sought to
generate influence models which
capture the realworld influence
outlined by the IM problem and to
determine node and edge
probabilities from realgraph data
based on pastpropagations
[16, 17, 41]. Effectively
generating these
influence models and
computing their node and edge
probabilities has also been an area
widely researched. For the IMRO problem
we implement the Negative Influence
model (NIM) and Graph Influence
model (GIM) as the
influence models which capture
node and edge
propagations among users within
an OSN. At the end of each stage (specified time period)
after users are placed with
impressions, the node
probabilities are updated based on the influence model used. The probability of a user clicking on an impression depends directly on their friend’s behavior (whether their friend have clicked or not on an impression).
The paper is organized as
follows. We briefly revisit the
introduction of the SDP method
to the IM problem in Section
(3).
We then
introduce the proposed
heuristics (LDH, AHC and MPSO
algorithms)
in Section (4). In
Section(5) we provide
experimental
results and a performance
analysis for synthetic
networks and realworld OSNs.
We conclude the paper in Section
6
summarizing the main
contributions
and directions for future work.
2 Related Work
The problem of
selecting an ideal
set of nodes in a graph or
determining which set
of
users should be
marketed to in order to obtain the
maximum expected profit from
sales
was first studied in [11, 40]. In these papers,
the
problem was
viewed as trying to
convince a subset of individuals
to purchase a new product or
innovation with the goal of
generating further purchases over
the
entire network. In other words, the
problem entailed
choosing specific users in a network
which
created a cascade over the entire
network. Solutions to this problem
comprised of both a
non linear and a
linear probabilistic model that
optimized the revenue
generated from sales.
Subsequently, in [22]
this optimization problem was
defined as the
IM problem
and
the emphasis shifted from maximizing
the profit generated from sales to
maximizing a cascade effect or
the number of activated
nodes at
the end of a diffusion process in the
context of diffusion
models
[18, 15, 8, 16].
Thresholds models have been
studied in
the context of sociological
theory and social networks in
[18, 32, 21]. However,
the generalization of the
Linear Threshold and Independent
Cascade model proposed
in
[22] lies at
the core of most threshold
models for the
IM problem .
Using the linear threshold and
independent cascade models, the
problem was shown to be NP hard
in [22]. Moreover, work
done in [38] showed
that using Linear
Programming and the
greedy algorithm, an approximate solution to the IM problem which was
within
of the optimal solution could be obtained.
Approaches to solving
influence maximization problems have been put forward in
[36, 13].
Similar to the work done in
[11, 40, 1], we focus on the
selection of an ideal set of nodes
for the purpose of optimizing
clicks and revenue to the advertiser. We
divert from approaches to the
problem that utilize influence
spread and the theory of
submodular functions as done in
several
papers
[26, 6, 27, 28, 30, 7, 8, 3, 4, 14, 20, 46, 47]
and focus on maximizing the
expected gains for the
advertiser. Our formulation to
the problem formally known as the
IM problem
is novel since in addition to
adopting a decisionmaking
perspective its main
goal is to maximize the expected
number or
clicks or revenue to the advertiser. Thus we
define a new
problem, the IMRO
problem and
introduce SDP as
the method in which this problem
can be solved.
Definition 2.1 (Influence Maximization  Revenue Optimization).
Given a network modeled as graph , a fixed number (impressions to be placed) and the probability of a user clicking on an impressions, the problem seeks to find the optimal users to place impressions so as to maximize the expected probability of clicking and ultimately revenue.
3 SDP for IMRO
We consider the mulitstage SDP method to the IM, first introduced in [19] and now defined as the IMRO problem. The problem entails placing an integer amount, , number of impressions to OSN users over stages, with representing the number of stages to go. The aim is to determine the number of impressions to be placed in each stage or impressiontostage allocations , and the optimal users that solves equation 1 :
(1) 
where , is the total number of impressions allocated over stages such that , as a user can only be given an impression once.
and which represent user being not given or given and has not clicked or has clicked on an impression respectively (see [19] for more details).
We briefly revisit an evaluation of the SDP method through an analysis on two simple networks, Figure (1) and Figure (2). Tables (1, 2 and 3) provide a concise survey of the SDP model on these networks using both the GIM and NIM as the influence models by which probabilities are updated. The GIM is given by equation (2) and the NIM is given by equation (3).
(2) 
(3) 
For these models, represents the number of friends who have clicked on an impression, represents the total number of friends of user and represents a user’s initial probability of clicking on an impression at the start of a stage problem when . This probability is a user’s natural inclination for clicking on an impression in the absence of any influence from friends. and are influence constants where is the negative influence constant associated with , the number of users who have been given impressions and have not clicked on them.
Influence Model  Stages  Allocation  User  Expected Clicks  Time 

GIM  2  [2,4]  0,4  2.12  300 
3  [1,2,3]  1  2.36  1240  
4  [1,1,2,2]  0  2.56  2,720  
5  [1,1,1,2,1]  8  2.64  6,830  
6  [1,1,1,1,1,1]  1  2.69  12,540  
NIM  2  [1,2]  0  0.96  244 
3  [2,1,3]  1,2  1.53  1,360  
4  [2,1,3,0]  1,2  1.53  2,320  
5  [2,1,3,0,0]  1,2  1.53  6,560  
6  [2,1,3,0,0,0]  1,2  1.53  12760 
Influence Model  Stages  Allocation  User  Expected Clicks  Time 

GIM  2  [3,3]  0,2,4  1.94  240 
3  [2,2,2]  0,4  2.08  1,300  
4  [1,2,2,1]  1  2.13  3,540  
5  [1,1,2,1,1]  1  2.19  5,880  
6  [1,1,1,1,1,1]  3  2.22  12,960  
NIM  2  [2,4]  2,3  0.96  230 
3  [2,3,1]  2,3  1.58  1,310  
4  [1,1,3,1]  2  1.58  3,670  
5  [1,1,3,1,0]  2  1.58  6,110  
6  [1,1,3,1,0,0]  2  1.58  12,930 
Influence Model  Stages  Allocation  User  Expected Clicks  Time 

GIM  3  [1,1,1]  0  1.04  18 
4  [1,1,2]  0  1.48  90  
5  [1,1,3]  0  1.91  440  
6  [1,2,3]  0  2.36  1331  
NIM  3  [1,1,1]  3  0.8  14 
4  [1,2,1]  3  1.06  110  
5  [1,3,1]  0  1.3  473  
6  [2,4]  7,8  1.53  1300 
For a
single stage problem involving 6
impressions with , the
optimal expected
number of clicks is calculated as
1.5. The optimal
expected number of clicks
determined
by the SDP method with ,
and under the GIM is 2.69, that is
an increase
of approximately 80% percent.
These results have considerable
gains for
the advertiser in terms of
spreading information and
optimizing
revenue. We note
that for the NIM, the optimal expected
number of clicks is achieved at 3 stages
in both networks
whilst for the GIM the optimal expected click
value increases as the number of stages
increases.
However, the running times to achieve
this results is unfeasible,
especially
for large networks hence the need for
computationally less extensive
heuristic solutions which
achieve near
optimal results. For both the
GIM
and NIM model a significant
increase
in the optimal expected number
of
clicks can be achieved at 3
stages in
reasonable time. Another
interesting
fact, is the drastic
increase in running times caused
by adding a single
impression. When and
, the SDP method achieves
the optimal solution in
approximately 7 minutes on these
simple
networks.
For an asymptotic analysis
on the SDP method,
we consider a 2stage problem with
impressions to be placed to its
users. If we consider the
impressiontostage allocation [1, M1], then there are
possible combinations
of
users to choose from for this. For
[2, M2], there are
possible combinations
of
users to choose from and possible combinations of users
to choose from for [3, M3]. If we
continue counting the steps in this
manner until the last impressionto
stage allocation , then
using the Binomial Theorem, we can
prove that
that is an upper bound on the
number of steps to attain the
optimal solution. Hence the SDP
method
has a complexity of in its worst case. For large
graphs, this proves to be
intractable.
In order to reduce its complexity
and evaluate
the performance of the SDP
method on larger
networks, we propose heuristics
which leverage
on the optimality of the SDP
method whilst reducing its
complexity.
Below we describe three heuristics, the LDH,
AHC and MPSO that
adopt the multistage aspect of
the SDP method. The MPSO, however
is the least reliable in terms
of its accuracy
compared to the LDH and AHC
since its state space comprises of
all
possible predetermined users in each
impression to stage
allocation and their associated expected number
of clicks. This is an essential
characteristic of the SDP method, and LDH and AHC heuristics in
attaining the optimal and nearoptimal solution.
4 Heuristics
4.1 Ldh
We begin by introducing the LDH as a method which reduces the complexity of the SDP method by reducing its branching factor. For a given 2 or 3 stage problem, the LDH generates the impressiontostage allocation [1, M1] or [1, 1, M2] respectively. Next, the optimal expected number of clicks is computed for this impressiontostage allocation using equation and with users . Here is the optimal solution to the IMRO problem in which , the node of the highest valency in the graph is selected at the first stage when . The inspiration for the LDH is based on the efficiency of well known high degree heuristics in [22] as well as the experimental findings of Section(3) in which the optimal solution was achieved in 3 stages. As the LDH expands only one node corresponding to either [1, M1] or [1, 1, M2], its complexity is , which is a drastic reduction to the complexity of the SDP method.
4.2 Ahc
The hillclimbing search algorithm often referred
to as the
greedy hillclimbing algorithm is an example
of a local search algorithm that operates by
expanding a single node and navigating to
neighboring nodes
with the goal of finding the global
minimum/maximum, if
one exists. The general hill
climbing algorithm and its variants have been proposed in [45, 10]. Moreover, for the IM problem the greedy
hillclimbing
algorithm and improvements of this algorithm have
been proposed in several papers
[22, 27, 30, 28, 6].
We implement an adaptive hill
climbing technique to
the IMRO problem with the
functionality of the
general hillclimbing algorithm,
however the
algorithm expands nodes
corresponding to the
impressiontostage allocation
[1,M1] or [1, M2, 1] for a
given 2 or 3 stage problem
respectively. The
first node to expand in the th stagetogo
is chosen randomly. Based on the
click
outcomes, the
probabilities over the
entire network are updated using
either the NIM or
GIM and the expected
number of clicks computed as in
the SDP method.
For the AHC algorithm, each time
a node is
randomly chosen in the stage to go and the
expected number of clicks
computed for the
allocation using equation
(1), its value is
compared to the previous
value computed. The AHC
algorithm
continues randomly expanding
nodes in the stage
togo and computing their
associated optimal expected
number of clicks for a specified
number of iterations .
In general, the hillclimbing
algorithm does not guarantee the
optimal solution, however has an
memory and is quite
efficient. We provide the hill
climbing algorithm adapted to
the IMRO problem as follows:
4.3 Mpso
Particle Swarm Optimization (PSO), was first proposed as one of the swarm intelligence algorithms for optimizing continuous nonlinear functions in [12]. PSO is an algorithm that is modeled on the social behavior of swarming observed in insects, fishes and birds [24]. The main idea of PSO originated from the movement of bird flocks, in which the algorithm can find the optimal solution in the search space just like a flock of bird searching for its food. For the original continuous space PSO algorithm proposed in [12], the particles cooperated with each other in a global optimum and dimensional search space in order to move to better positions.. The position vector is used to denote the current solution of particle whilst the velocity vector is used to provide the direction of the particle and adjust the particle’s position to the optimal solution. Various researchers have extended the original PSO algorithm proposed in [12] to discrete optimization problems [9, 46, 42, 43]. The first of this kind was the binary particle swarm (BPSO) proposed in [23]. Similar to the continuous space PSO algorithm, the discrete space PSO algorithm involves the following probability update rules:
(4) 
(5) 
The
particle maintains both a
position and velocity over iterations given by
and respectively,
where
is the vector representing the personal best
solution of the particle and
, the
global best solution obtained by the entire
swarm. and are parameters which
weigh each particles own experience and the the
entire swarm respectively whilst, , are constants such that, ,
[0,1]. At each iteration, the
particle’s velocity is updated
by using its own
search experience and the
experience of the
entire swarm as it flies to a
new search
position.
For the implementation of the
MPSO algorithm,
the state space comprises of the
set of
possible predetermined users in
each impression to stage
allocation
with their corresponding
expected number of clicks. We
modify and make use of
a key concept called a Swap
Operator proposed in [46]
to handle discrete type PSO
problems.
For the implementation of the
MPSO algorithm, a solution
set can be described as a
specific impression to stage
allocation in which
all of the users are identified.
We define a Swap operator as intechanging user i with the user in the th position, as addding user to the the th position in the stage to go and as removing user to and from the
position in the stage to go.
Using the these swap operators we can redefine
addition on the
solution sets with
a new solution . That is,
(6) 
(7) 
(8) 
A swap sequence , is a
sequence made up
of one or more of the following
Swap Operators as
defined in equations
(6,
(7 and
8).
We redefine subtraction, on two solutions
and as the Swap Sequence
acting on the solution in order to
obtain solution .
For example,
consider a SDP formulation
of the IMRO problem involving 4 impressions
and 2 stages, with two solutions and :
=[2, 2] with users 1,2
in the first stage and 3,5 in the second stage.
= [1,3] with users 5 in the first stages and 2,3,1 in the second stage.
We can apply the Swap Operator to removing user 2 from the first position to obtain a new solution = [1,2] with user 5 in the first stage and use 3,1 in the second stage. The second Swap Operator can be applied to where user 2 is added to position 2 in order to obtain a new solution = with user 5,2 in the first stage and users 3, 1 in the second stage. The third swap operator is applied to and interchanges the user in position 1 with user 1. Thus = [2,2] with user 1,2 in the first stage and 3,5 in the second stage. Hence, a swap sequence with the least number of operators for is = . In implementing the MPSO, the velocity is updated using equation and applying the relevant swap sequences . We provide an algorithm, Algorithm for the procedure as follows:
5 Experiments
We evaluated the effectiveness of the proposed heuristics using synthetic and realworld OSNs.
5.1 Datasets
We employed various synthetic
networks and two
realworld OSNs represented
as graphs to analyze each
method. Synthetic networks of
various sizes 10, 50,100, 500,
1000, 2000, 4000, 4500 and 5200
were generated
using a pseudo random
number generator as done in
[33]. From a sample
of 10 generated synthetic graphs, the
average node degree was found to
be at least 60 % of the number
of nodes in the
graphs.
In addition to these networks, we
utilized two real
world OSNs
Flickr and Epinions obtained
from the Social Computing Data
Repository in [48] and
the
Stanford Network
Analysis Platform in
[31] respectively.
The OSN, Flickr is
is an image hosting and video
sharing website where users can
share images among each other.
In this network
"1,2" is used to represent the
friendship relationship between
the
user id 1 and the user id 2. The
entire dataset consists of
80,513 nodes, from this
we extracted two datasets, FL1
comprising of 11,098 nodes and
FL2, comprising of 20,217 nodes
each with an average
node degree of 2 nodes for the
purpose of
evaluating each
heuristics.
Epinions is a customer review OSN in which users rate
various products that are
purchased on Ebay. The entire
dataset
consists of 75,879 nodes, from which, we extracted a dataset of
4,382 nodes with
an average node degree of 3
nodes and refer to this
dataset as Ep.
5.2 Experimental Settings
Influence models for the IM problem
can be
described as models which
capture realworld propagations
or the spread of information
among users within a
network. In addition to the
diffusion models; the Linear
Threshold and
Independent Cascade models
defined in [22],
influence models that determine
node and edge
probabilities have been
proposed in
[11, 40, 13, 16, 4, 5].
For the IMRO problem we
introduce the GIM equation
(2) and NIM
(3) as
the pertinent
influence models by which probabilities are updated at the end of each stage. The SDP method
for the problem adopts a
multistage approach and at each
stage
users are provided
with advertising links or
impressions. At the end of each
stage, the outcomes or whether a
user has clicked or not are
determined and this information
is utilized in the influence
models to update the
probabilities for future
stages. The objective thus becomes to
determine the number of
impressions to be at placed at
each stage and the users to
place impressions to, so as to
maximize the number of
purchases. A user clicking on an
impression is equated to a user
purchasing a product, therefore
optimizing the revenue generated
is identical to optimizing the
expected number of clicks.
A user’s initial probability of
clicking, was
arbitrarily set to be
for these experiments,
and were also
arbitrarily
set to be 0.25.
However for future work, we will
demonstrate that
can be effectively estimated
using data mining techniques.
All our experimentation was
undertaken on a server with
8GB of RAM and i3 Processor. The
SDP method, LDH, AHC and MPSO heuristics were implemented from
scratch using a Python version
2.7 (64 bit) with an average of
10 runs
taken for each experiment.
5.3 Performance Analysis on Synthetic Networks
Influence Model  Graph Size  Iteration (n)  Optimal Clicks  Time (secs) 
GIM  50  1  1.675  5 
5  1.688  10  
10  1.727  22  
20  1.718  38  
50  1.821  78  
500  1  1.673  17  
5  1.673  98  
10  1.673  280  
20  1.673  360  
50  1.709  981  
100  1.721  2,348  
2000  1  1.673  231  
5  1.673  1,237  
10  1.673  2,257  
20  1.71  3,615  
4500  1  1.672  1,218  
5  1.672  7215  
10  1.672  14,427  
NIM  50  1  1.260  5 
5  1.261  8  
10  1.262  15  
20  1.262  25  
50  1.263  53  
500  1  1.251  23  
5  1.251  85  
10  1.251  187  
20  1.251  327  
50  1.251  877  
100  1.252  2,060  
2000  1  1.250  215  
5  1.250  1,311  
10  1.250  2,276  
20  1.250  3,616  
4500  1  1.250  1,198  
5  1.250  7,228  
10  1.250  18,031  
Datasets  
500  
1000  
2000  
4500  
5200  
Graph Size  Swarm Size  Iteration  Optimal Clicks  Time (secs) 
50  10  1  1.260  5 
10  1.261  50  
20  1.261  63  
40  1.261  117  
80  1.261  261  
100  1.261  275  
50  1  1.262  417  
10  1.262  154  
20  1.262  294  
40  1.262  535  
80  1.262  1335  
100  1  1.262  63  
10  1.262  284  
20  1.262  608  
40  1.262  1,062  
80  1.262  2,176  
500  10  1  1.251  114 
10  1.251  357  
20  1.251  697  
40  1.251  1,706  
80  1.251  3,606  
20  1  1.251  179  
10  1.251  973  
20  1.251  2,221  
40  1.251  3,617  
80  1.251  7,440  
50  1  1.251  467  
10  1.251  2,258  
20  1.251  4,920  
40  1.251  7,257  
2000  10  1  1.250  1,998 
10  1.250  10,838  
The results indicated in Table
(5) convey the
optimal expected number of clicks and
running times of the LDH under the
GIM. As shown in Table
(5), the LDH is orders of
magnitude faster than the SDP
method achieving a “good"
solution of 1.45 in less than an hour
on a synthetic graph of 5200 nodes. For now, we can think of
a “good"
solution as a solution that is at
least as high as the
value obtained by placing all
the impressions in one stage,
however, for future work we will
obtain an upper
bound on the optimal solution, as
this will provide greater insights
into reasonable solutions and
how well these heuristics perform on
large graphs. We note that the
optimal expected
number of clicks determined by the
SDP method on a graph of 10 nodes
was found to be 1.91 under identical
model parameters of this experiment. We further note
that an
increase in the value of
, even when
is assigned small values , results in
an
increase in the expected number
of clicks. Hence we propose the
LDH method as a reasonable and
promising method which leverages on the accuracy of the
SDP method whilst reducing its
complexity.
To evaluate
the performance of the AHC
algorithm, we
varied
the graph
sizes and number of iterations.
The results in Table
(4) indicate
that the optimal
expected number of clicks increases
with the number of iterations,
particularly for large values of
,
that is, as seen in the
graphs of
50 and 500 nodes. The AHC
algorithm generates a value of
1.250 for a
graph of 2000 nodes in less than 50
iterations and 1.252
for a
graph for a graph of 500 nodes in
100 iterations, both values
in less
than an hour. Under the GIM, the
AHC generates higher values as high
as 1.821 for a graph of 50 nodes in
50 iterations. For a network of
size 2000 nodes, the AHC generates
1.71 clicks in approximately one
hour. However for a graph of 4,500
nodes notably under NIM, the
AHC proves to be unfeasible taking
5 hours to generate 1.250 clicks in 10 iterations.
Hence we consider
the AHC method as a moderately
efficient method for obtaining near
optimal solutions to the IMRO
problem. Taking into consideration
(1)
increasing the number of iterations
increases the optimal expected
number of
clicks and running times and (2)
utilizing ideal influence model parameters can
generate higher optimal expected click
values.
Table
provides and analysis for the MPSO
method on the IMRO problem with
both and
set to 0.5. In
particular, we note the effect of
increasing
the swarm size, and
the
number of iterations on the
optimal expected number of clicks. For a network of
50 users with ,
and less than 10 iterations, the
MPSO
method generates “good" results in
minutes under the NIM.
However, for larger graphs, (i.e
greater than 500), the MPSO
converges slowly taking hours to
converge to less accurate
solutions. This is primarily due to the fact that
its running times increases
significantly with its swarm size
and number of iterations. From the
analysis in
Table(6), we can
conclude that the
MPSO method is a fairly reasonable
algorithm in terms of
achieving nearoptimal solutions,
however its running time is too
slow making it unfeasible for
large graphs. Moreover, it is unreliable in terms of accuracy since its state
space consists of
a set of optimal expected click
values for predetermined users in
impressiontostage
allocations.
We observe Figure (3) and Figure and note the effect of varying on all three methods. The results indicate that the LDH and AHC generate identical optimal expected number of click values on various synthetic networks. We consider Figure when and highlight the significant increase in the optimal expected number of clicks from 1.0 to 1.5. These results have considerable gains for any OSN advertiser and significant implications for the choice of influence models and the effect of optimizing influence model parameters in maximizing the expected number of clicks. Another reason for the similarity in performance of the LDH and AHC algorithms can be attributed to the similarity in the synthetic networks each being generated by the same random number generator. We note that the MPSO algorithm generates the highest expected number of clicks for all graph sizes however its running time is too slow for large graphs, this result is further supported in our scalability analysis.
5.4 Scalability
To evaluate the scalability, the sizes of synthetic networks were doubled from 250, 500, 1000,…, up to 4000 nodes.
Figures (6) and (6) demonstrate the results of the running times of the LDH, AHC and MPSO methods on a regular scale and loglog scale respectively. From the result in Fig (6), we can clearly deduce that the PSO algorithm is not scalable since its running times is in the hour range for 2000 nodes making it unfeasible to run on larger graphs. We also consider the high degree of the graphs generated by the psuedo random number generator allowing them to be suitable indicators for relatively any dataset. Figure (6) provides a further differentiation between the algorithms. From this results we conclude that all three algorithms have similar slopes, however the LDH and AHC has both a good slope and intercept making them suitable for large graphs with at least thousands of nodes and edges.
5.5 Performance on real world OSNs
We compare the computational time and optimal expected number of clicks generated by the LDH and AHC heuristics on two realworld OSNs under the GIM with model parameters , and 10 iterations. Table and Figure indicate that the performance of the LDH is considerably better or at least as good as the AHC heuristic in terms of the optimal number of clicks generated on the Epinions dataset whilst the AHC generates significantly higher optimal expected click values on the Flickr dataset. We attribute these results to the design of the LDH being more suited to the structure of the OSN Epinions and less to the structure of Flickr. Indeed, while both the LDH and the AHC heuristic achieve near optimal solutions in a run time of under 30 minutes even for a network of 20,21 users, the LDH attains the optimal values in seconds for all three networks. In general, the AHC is well suited to both Epinions and Flickr OSNs in terms of its accuracy and running times. For a problem involving 5 impressions, the optimal expected number of clicks generated is at least 2. Although the LDH generates similar results for a problem involving 5 impressions on the Epinions dataset, the optimal expected number of clicks generated from the Flickr dataset is 1 even when there is an increase the number of stages.
Method  OSN  Impressions  Stages  Optimal Clicks  Time (secs) 
LDH  Ep,4,382  5  2  1.56  4.4 
10  2  3  4.7  
50  2  13  4.7  
100  2  25.5  4.9  
200  2  50.5  5.5  
5  3  2.01  30  
Fl1,11,098  5  2  1  12.6  
10  2  2.25  10  
50  2  12.25  10  
100  2  24.75  10  
200  2  49.75  10.  
5  3  1  85  
Fl2,20,217  5  2  1  15  
10  2  2.25  15  
50  2  12.25  16  
100  2  24.75  15  
200  2  49.75  16  
5  3  1  83  
AHC  Ep,4,382  5  2  1  20 
10  2  3.3  33  
50  2  13  389  
100  2  25.5  40  
200  2  50.6  42  
5  3  2.1  45  
Fl1,11,098  5  2  1.5  56  
10  2  2.8  65  
50  2  12.9  65  
100  2  25.3  71  
200  2  50.3  97  
5  3  1.9  603  
Fl2,20,217  5  2  1.55  163  
10  2  3.25  1701  
50  2  13.13  168  
100  2  26.2  160  
200  2  51.6  157  
5  3  2  1,298  
The LDH and AHC heuristic exhibit good performance and are orders of magnitude faster than the SDP method. The results for these heuristics suggest that advertising companies can target the optimal users to market (or spread information) to in OSNs in a way that can generate predictable and lucrative gains for socioeconomic advancement.
6 Conclusion
We provide a novel
approach to influence
maximization which until now has been
primarily used in resource
allocation and shortest path
problems. We divert
from previous approaches to
influence maximization based on
the theory of submodular
functions and adopt a novel and
practical
decisionmaking approach geared
towards
maximizing clicks an revenue
among users of an OSN.
Hence we redefine the
problem as IMRO and introduce
SDP as the method in which this
problem can be solved. We first
reviewed the properties of the
SDP method on small synthetic
networks and highlight the lucrative
advantages that our method poses
to advertising companies in terms
of generating revenue and
optimizing clicks. Due to the
complexity of the SDP method, we
sought to obtain heuristics which
achieved near optimal solutions
in considerably less
time.
We second, proposed three
heuristics, the LDH, AHC and MPSO
algorithms which exploited the
multistage
attribute of the SDP method
whilst reducing its complexity. In
addition to achieving
nearoptimal solutions, all
three methods were found to be
orders of magnitude faster than
the SDP method. We provided a
scalability analysis and
evaluated our proposed heuristics
on synthetic networks of various
sizes and two realworld OSNs,
Flickr and Epinions. The LDH and
AHC are shown to be wellsuited
heuristics for the SDP method in
terms of
their accuracy, scalability and
running times.
The AHC is a more efficient
heuristic than the LDH since it
outperforms the LDH in terms of
accuracy and running times
for the two realworld OSNs.
We confirmed that the GIM
exceeded the NIM in generating
optimal expected
number of click with
approximately
the same computational times.
It was shown that
increasing within both
influence models significantly
increased the optimal expected
number of clicks even when
remained small eg.
. This result
provides substantial implications
for the potential gains in
obtaining ideal influence models
and optimizing their associated
model
parameters.
Our
immediate future work is to
provide an extensive analysis on
our influence models and how
their parameters affect the IMRO
problem. It is also necessary to
obtain accurate estimates of the
influence model parameters
through statistical and data
mining techniques in order to
improve on the optimality of the
expected number of
clicks.
As a immediate consequence of
approaching the IM
problem through a decisionmaking
perspective there are multiple
directions for future work, both
in terms of optimization
(approximate dynamic programming methods) and
data science. The
results presented provide an
evaluation for our methods on
large networks. It is also
necessary to derive an upper
bound on the objective function
in order to determine how well
our methods perform on these
large networks. Another direction
for future work
related to influence maximization
is
to obtain the influence spread
for the IMRO problem where the
influence spread is defined as a function on
the number of stages of the
problem. Our future work also includes exploring this
applications in fields
of healthcare,
communication, epidemiology,
education, and agriculture.
References
 [1] Abbassi, Z., Bhaskara, A., Misra, V.: Optimizing display advertising in online social networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee.111 (2015).
 [2] Bertsekas, D. P., & Tsitsiklis, J. N., "An Analysis of Stochastic Shortest Path Problems", Mathematics of Operations Research. 16, 580595 (1991)
 [3] Bhagat, S., Goyal, A., Lakshmanan, L.: Maximizing product adoption in social networks. In Proceedings of the 5th ACM International Conference on Web search and Data Mining. ACM. 603612 (2012).
 [4] Cao T, Wu X., Hu T.X., Wang S.: Active learning of model parameters for influence maximization. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 280 295 (2011).
 [5] Chakrabarti, P., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM. 307318 (1998)
 [6] Chen, W., Wang, W. Y., Yang, S.: Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.199208 (2009).
 [7] Chen, W., Wang, C., Wang, Y. : Scalable influence maximization for prevalent viral marketing in large scale social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 1029–1038 (2010)
 [8] Chen, W., Collins, A., Cummings, R.: Influence maximization in social networks when negative opinions may emerge and propagate. SIAM SDM, 11: 379390 (2011)
 [9] Clerc, M.: Discrete particle Swarm Optimization, in New Optimization Techniques in Engineering. New York, SpringerVerlag (2004).
 [10] Davis, L.D.: Bitclimbing, representational bias, and test suite design. In R. K. Belew and L. B. Booker (eds.), Proceedings of the Fourth International Conference on Genetic Algorithms, pp 1823. CA: Morgan Kaufmann, San Mateo (1991)
 [11] Domingos, P. & Richardson, M. : Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 5766 (2001)
 [12] Eberhart, R.C. , Kennedy,J. A new optimizer using particle swarm theory.In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1, 39–43 (1995)
 [13] Galhotra S., Arora A., Shourya R.: Holistic Influence Maximization: Combining Scalability and Efficiency with OpinionAware Models. In Proceedings of the 2016 International Conference on Management of Data. SIGMOD. 743 758 (2016)
 [14] Galstyan, A., Musoyan, V., Cohen, P.: Maximizing influence propagation in networks with community structure. Phys. Rev. E. 79(5), (2009)
 [15] GomezRodriguez M., Leskovec J., Krause A. Inferring networks of diffusion and influence. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 10191028. (2010)
 [16] Goyal, A., Bonchi, M., & Lakshmanan, L.: Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining. 241250. (2010)
 [17] Goyal, A., Bonchi, F., Lakshmanan, L.: A database approach to social influence maximization. In Proceedings of the 38th international conference of the VLDM Endowment. ACM. 7384. (2011)
 [18] Granovetter, M. Threshold models of collective behavior. The American Journal of Sociology (6), 1420–1443. (1978)
 [19] Hosein, P., Lawrence, T.: Stochastic dynamic model for revenue optimization in social networks. In Proceedings of the 11th International Conference On Wireless and Mobile Computing, Networking and Communications.IEEE. 378383 (2015)
 [20] HosseiniPozveh, M., Zamanifar, K., NaghshNilchi, A., Dolog, P., Maximizing the spread of positive influence in signed social networks. Intelligent Data Analysis. 20(1) 199218 (2006)
 [21] Jackson, M. & Yariv, L.: Diffusion on social networks. Economie Publique. 16 6982. (2005).
 [22] Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 137146 (2003).
 [23] Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In Proceedings of IEEE International Conference on Computational Cybernetics and Simulation. IEEE. 5, 41044108 (1997).
 [24] Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, (2004)
 [25] Kimura, M., Saito, K. Motod. H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence. (2008).
 [26] Kimura, M., Saito, K.: Approximate solutions for the influence maximization problem in a social network. KnowledgeBased Intelligent Information and Engineering Systems. LNCS. 4252: 93744 (2006)
 [27] Kimura, M., Saito, K., Motod, H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence. (2008).
 [28] Liu, B., Cong, G., Xu, D.: Time constrained influence maximization in social networks. In Proceedings of the 12th IEEE International Conference on Data Mining. IEEE. 43948. (2012).
 [29] Levi, R., Roundy, R., Shmoys, D.B. Provably nearoptimal samplingbased policies for stochastic inventory control models. Math. Oper. Res. 32 821839. (2007)
 [30] Leskovec, J., Krause, K. ,Geustrin, C.: Cost effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 420429. (2007).

[31]
Leskovec, J. &
Krevl, A.: SNAP Datasets:Stanford
Large Network Dataset Collection.
(2014)
http://snap.stanford.edu/data  [32] Macy, M. & Willer, R.: From Factors to Actors: Computational Sociology and AgentBased Modeling. Ann. Rev. Soc. (2002)
 [33] Matsumoto, M. & Nishimura, T. “Mersenne Twister: A 623dimensionally equidistributed uniform pseudorandom number generator”. ACM . Transactions on Modeling and Computer Simulation. 8. 330 (1998)
 [34] Meerman, D.S. Viral Marketing: Let the world tell your story for free [online], Pragmatic Marketing, Available from: <http://www.pragmatic marketing.com/publications/magazine/5/5/viralmarketingletthe worldtellyourstoryforfree>,(2008)Accessed 10 Oct 2017
 [35] Morone, F. & Makse, H.: Influence maximization in complex networks through optimal percolation.Nature. 524, 6568 (2015)
 [36] Narayanam, R., Narahari, Y.:Determining the topk nodes in social networks using the shapely value. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM. 15091512. (2008).
 [37] Nascimento, J. & Powell, W. :An Optimal Approximate Dynamic Programming Algorithm for the Economic Dispatch Problem with GridLevel Storage, IEEE Transactions on Automatic Control (2013)
 [38] Nemhauser, G.L & Wolsey, L.A. : An Analysis Of Approximations For Maximizing Submodular Set FunctionsI. Mathematical Programming. 14 265294 (1978).
 [39] Powell, W. B. : Exploration Versus Exploitation, in Approximate Dynamic Programming: Solving the Curses of Dimensionality, Second Edition, John Wiley & Sons, Inc., Hoboken, NJ, USA. (2011) doi: 10.1002/9781118029176.ch12
 [40] Richardson, M. & Domingos, R. : Mining knowledge sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 6170 (2002)
 [41] Saito, K., Nakano, R., Kimura, M. : Prediction of information diffusion probabilities for independent cascade model. Knowledge Based Intelligent and Engineering Systems. 6775 (2008).
 [42] Salman, A., Ahmad, I., AlMahadi, S.: Particle Swarm Optimization for task assignment problems. Microprocessor microsystems. 26, 363371 (2002)
 [43] Sha, D.Y. & Hsu, C. : A hybrid particle swarm optimization for job scheduling problem. Computers and Industrial Engineering. 51, 791–808 (2006)
 [44] Singer, Y. : How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for 470 Social Networks. Fifth ACM Int Conf Web Search Data Min. 110 (2012)
 [45] Tsang, E. & C. Voudouris, C.: Fast local search and guided local search and their application to british telecom’s workforce scheduling. Technical Report CSM246, Department of Computer Science, University of Essex, Colchester, UK, (1995)
 [46] Wang, Y., Cong, G., Song, G.: Communitybased greedy algorithm for mining to k influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. DOI: 10.1145/1835804. 1835935. (2010)
 [47] Wu, HaoHsiang & Kucukyavuz, S.: Maximizing Influence in Social Networks: A TwoStage Stochastic Programming Approach That Exploits Submodularity. Department of Integrated Systems Engineering, The Ohio State University, Columbus, OH. (2016).
 [48] Zafarani, R. & Liu, H. Social Computing Data Repository at ASU (2009) [http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State University, School of Computing, Informatics and Decision Systems Engineering
 [49] Facebook Reports Fourth Quarter and Full Year 2016 Results. https://s21.q4cdn.com/399680738/files/doc_financials/2016/Q4/FacebookReportsFourthQuarterandFullYear2016Results.pdf(2016). Accessed 3 January 2017