Stochastic Dynamic Programming Heuristics for Influence Maximization-Revenue Optimization

Stochastic Dynamic Programming Heuristics for Influence Maximization-Revenue Optimization

Trisha Lawrence
Department of Mathematics and Statistics
University of Saskatchewan
106 Wiggins Road
Saskatoon, SK S7N 5E6, CANADA
Abstract

The well-known Influence Maximization (IM) problem has been actively studied by researchers over the past decade, with emphasis on marketing and social networks. Existing research have obtained solutions to the IM problem by obtaining the influence spread and utilizing the property of submodularity. This paper is based on a novel approach to the IM problem geared towards optimizing clicks and consequently revenue within an Online Social Network (OSN). Our approach diverts from existing approaches by adopting a novel, decision-making perspective through implementing Stochastic Dynamic Programming (SDP). Thus, we define a new problem Influence Maximization-Revenue Optimization (IM-RO) and propose SDP as a method in which this problem can be solved. The SDP method has lucrative gains for an advertiser in terms of optimizing clicks and generating revenue however, one drawback to the method is its associated “curse of dimensionality" particularly for problems involving a large state space. Thus, we introduce the Lawrence Degree Heuristic (LDH), Adaptive Hill- Climbing (AHC) and Multistage Particle Swarm Optimization (MPSO) heuristics as methods which are orders of magnitude faster than the SDP method whilst achieving near-optimal results. Through a comparative analysis on various synthetic and real-world networks we present the AHC and LDH as heuristics well suited to to the IM-RO problem in terms of their accuracy, running times and scalability under ideal model parameters. In this paper we present a compelling survey on the SDP method as a practical and lucrative method for spreading information and optimizing revenue within the context of OSNs.

1 Introduction

Viral marketing possess lucrative advantages to advertising and marketing companies compared to traditional direct marketing strategies due to its ease of deployment and ability to effectively use customers themselves to encourage product preferences in others [34]. Viral marketing through online advertising accounts for a major source of revenue for many OSNs. For example, according to [49], advertising continues to propel Facebook’s revenue generation, accounting for billion, the majority source of income for Facebook in 2016. OSNs utilize the advantage of viral marketing because one considers not only the effect of marketing to a customer so that the customer purchases a product but also the customer’s influence in persuading other customers to purchase as well.
The focus of this paper centers around the positioning of an advertiser’s link delivered to a web page, that is to say, the placement of advertisement impressions. Advertising companies have the task of placing impressions on pages to be displayed to its users. Thus, the objective of the problem becomes to place impressions to OSN users in a way that maximizes the value to the advertiser. The problem known as the IM problem was first formally expressed in [22] as choosing a good initial set of nodes to target in the context of influence models such as, the Independent Cascade, Linear Threshold and generalizations that followed  [8, 16]. However, the problem of choosing an ideal set of customers in a network to market to in order to generate the maximum profit to the advertiser was first studied in  [11, 40]. Since its formal definition in [22], the IM problem has been actively studied by researchers over the past decade and is not restricted to applications in marketing only but also in healthcare, communication, education, agriculture, and epidemiology  [27, 30, 35, 44]. For the IM problem, OSN users are represented in a graph , where the nodes of represent the users and the edges in represent the relationships between users. In [22], the problem was first defined as a discrete optimization problem and the term influence of a set of nodes , denoted by was defined to be the expected number of active nodes at the end of a diffusion process, given that was this initial active node set. According to the work done in [22], the IM problem therefore seeks to determine a parameter , that is, to find a set of maximum influence; where . It is an open question to compute this set and expected number of active nodes by an efficient method, but very good estimates have been proposed and obtained  [6, 7, 30].
This paper provides a novel approach to the IM problem and a formal definition to the model proposed in [19]. We divert from all other existing approaches to IM and adopt a novel decision-making perspective primarily used in shortest paths and resource allocation problems [2, 29, 37, 39]. Thus, we define a new problem, the IM-RO problem and implement SDP as the method in which this problem can be solved. The SDP method and its multistage attribute was demonstrated to generate lucrative gains to advertisers; causing over an 80% increase in the expected number of clicks when evaluated on various networks.
Due to the complexity of the SDP method, we propose and analyze the LDH, AHC and MPSO algorithms as heuristics employed to tackle the “curse of dimensionality" associated with implementing SDP. Through conducting experiments on synthetic networks, we demonstrate that all three methods achieved near-optimal solutions and are orders of magnitude faster than the SDP method. We provide a comparative analysis on various synthetic and real-world networks and present the LDH and AHC as promising heuristics in terms of their accuracy, running times and scalability under suitable model parameters. Although the MPSO heuristic generated the highest expected number of clicks when compared to the LDH and AHC heuristics, it is unreliable and its running time is too slow thus making it unfeasible for large graphs, i.e over 500 nodes. Our results reveal the high potential of the LDH and AHC heuristics as an effective advertising strategy in providing near-optimal expected click values in minimal running times.
Researchers have also sought to generate influence models which capture the real-world influence outlined by the IM problem and to determine node and edge probabilities from real-graph data based on past-propagations  [16, 17, 41]. Effectively generating these influence models and computing their node and edge probabilities has also been an area widely researched. For the IM-RO problem we implement the Negative Influence model (NIM) and Graph Influence model (GIM) as the influence models which capture node and edge propagations among users within an OSN. At the end of each stage (specified time period) after users are placed with impressions, the node probabilities are updated based on the influence model used. The probability of a user clicking on an impression depends directly on their friend’s behavior (whether their friend have clicked or not on an impression).
The paper is organized as follows. We briefly revisit the introduction of the SDP method to the IM problem in Section (3). We then introduce the proposed heuristics (LDH, AHC and MPSO algorithms) in Section (4). In Section(5) we provide experimental results and a performance analysis for synthetic networks and real-world OSNs. We conclude the paper in Section 6 summarizing the main contributions and directions for future work.

2 Related Work

The problem of selecting an ideal set of nodes in a graph or determining which set of users should be marketed to in order to obtain the maximum expected profit from sales was first studied in  [11, 40]. In these papers, the problem was viewed as trying to convince a subset of individuals to purchase a new product or innovation with the goal of generating further purchases over the entire network. In other words, the problem entailed choosing specific users in a network which created a cascade over the entire network. Solutions to this problem comprised of both a non linear and a linear probabilistic model that optimized the revenue generated from sales. Subsequently, in [22] this optimization problem was defined as the IM problem and the emphasis shifted from maximizing the profit generated from sales to maximizing a cascade effect or the number of activated nodes at the end of a diffusion process in the context of diffusion models  [18, 15, 8, 16]. Thresholds models have been studied in the context of sociological theory and social networks in  [18, 32, 21]. However, the generalization of the Linear Threshold and Independent Cascade model proposed in [22] lies at the core of most threshold models for the IM problem .
Using the linear threshold and independent cascade models, the problem was shown to be NP hard in [22]. Moreover, work done in [38] showed that using Linear Programming and the greedy algorithm, an approximate solution to the IM problem which was within of the optimal solution could be obtained. Approaches to solving influence maximization problems have been put forward in  [36, 13]. Similar to the work done in  [11, 40, 1], we focus on the selection of an ideal set of nodes for the purpose of optimizing clicks and revenue to the advertiser. We divert from approaches to the problem that utilize influence spread and the theory of submodular functions as done in several papers  [26, 6, 27, 28, 30, 7, 8, 3, 4, 14, 20, 46, 47] and focus on maximizing the expected gains for the advertiser. Our formulation to the problem formally known as the IM problem is novel since in addition to adopting a decision-making perspective its main goal is to maximize the expected number or clicks or revenue to the advertiser. Thus we define a new problem, the IM-RO problem and introduce SDP as the method in which this problem can be solved.

Definition 2.1 (Influence Maximization - Revenue Optimization).

Given a network modeled as graph , a fixed number (impressions to be placed) and the probability of a user clicking on an impressions, the problem seeks to find the optimal users to place impressions so as to maximize the expected probability of clicking and ultimately revenue.

3 SDP for IM-RO

We consider the mulitstage SDP method to the IM, first introduced in [19] and now defined as the IM-RO problem. The problem entails placing an integer amount, , number of impressions to OSN users over stages, with representing the number of stages to go. The aim is to determine the number of impressions to be placed in each stage or impression-to-stage allocations , and the optimal users that solves equation 1 :

(1)

where , is the total number of impressions allocated over stages such that , as a user can only be given an impression once.

and which represent user being not given or given and has not clicked or has clicked on an impression respectively (see [19] for more details).

Figure 1: A simple network on 10 nodes from graph generator
Figure 2: A simple network on 10 nodes randomly drawn

We briefly revisit an evaluation of the SDP method through an analysis on two simple networks, Figure (1) and Figure (2). Tables (1, 2 and 3) provide a concise survey of the SDP model on these networks using both the GIM and NIM as the influence models by which probabilities are updated. The GIM is given by equation (2) and the NIM is given by equation (3).

(2)
(3)

For these models, represents the number of friends who have clicked on an impression, represents the total number of friends of user and represents a user’s initial probability of clicking on an impression at the start of a -stage problem when . This probability is a user’s natural inclination for clicking on an impression in the absence of any influence from friends. and are influence constants where is the negative influence constant associated with , the number of users who have been given impressions and have not clicked on them.

Influence Model Stages Allocation User Expected Clicks Time
GIM 2 [2,4] 0,4 2.12 300
3 [1,2,3] 1 2.36 1240
4 [1,1,2,2] 0 2.56 2,720
5 [1,1,1,2,1] 8 2.64 6,830
6 [1,1,1,1,1,1] 1 2.69 12,540
NIM 2 [1,2] 0 0.96 244
3 [2,1,3] 1,2 1.53 1,360
4 [2,1,3,0] 1,2 1.53 2,320
5 [2,1,3,0,0] 1,2 1.53 6,560
6 [2,1,3,0,0,0] 1,2 1.53 12760
Table 1: 6 impressions varying stages on Figure 1
Influence Model Stages Allocation User Expected Clicks Time
GIM 2 [3,3] 0,2,4 1.94 240
3 [2,2,2] 0,4 2.08 1,300
4 [1,2,2,1] 1 2.13 3,540
5 [1,1,2,1,1] 1 2.19 5,880
6 [1,1,1,1,1,1] 3 2.22 12,960
NIM 2 [2,4] 2,3 0.96 230
3 [2,3,1] 2,3 1.58 1,310
4 [1,1,3,1] 2 1.58 3,670
5 [1,1,3,1,0] 2 1.58 6,110
6 [1,1,3,1,0,0] 2 1.58 12,930
Table 2: 6 impressions varying stages on Figure 2
Influence Model Stages Allocation User Expected Clicks Time
GIM 3 [1,1,1] 0 1.04 18
4 [1,1,2] 0 1.48 90
5 [1,1,3] 0 1.91 440
6 [1,2,3] 0 2.36 1331
NIM 3 [1,1,1] 3 0.8 14
4 [1,2,1] 3 1.06 110
5 [1,3,1] 0 1.3 473
6 [2,4] 7,8 1.53 1300
Table 3: Increasing the number of impressions in 3 stages on Figure 1

For a single stage problem involving 6 impressions with , the optimal expected number of clicks is calculated as 1.5. The optimal expected number of clicks determined by the SDP method with , and under the GIM is 2.69, that is an increase of approximately 80% percent. These results have considerable gains for the advertiser in terms of spreading information and optimizing revenue. We note that for the NIM, the optimal expected number of clicks is achieved at 3 stages in both networks whilst for the GIM the optimal expected click value increases as the number of stages increases. However, the running times to achieve this results is unfeasible, especially for large networks hence the need for computationally less extensive heuristic solutions which achieve near optimal results. For both the GIM and NIM model a significant increase in the optimal expected number of clicks can be achieved at 3 stages in reasonable time. Another interesting fact, is the drastic increase in running times caused by adding a single impression. When and , the SDP method achieves the optimal solution in approximately 7 minutes on these simple networks.
For an asymptotic analysis on the SDP method, we consider a 2-stage problem with impressions to be placed to its users. If we consider the impression-to-stage allocation [1, M-1], then there are possible combinations of users to choose from for this. For [2, M-2], there are possible combinations of users to choose from and possible combinations of users to choose from for [3, M-3]. If we continue counting the steps in this manner until the last impression-to- stage allocation , then using the Binomial Theorem, we can prove that that is an upper bound on the number of steps to attain the optimal solution. Hence the SDP method has a complexity of in its worst case. For large graphs, this proves to be intractable. In order to reduce its complexity and evaluate the performance of the SDP method on larger networks, we propose heuristics which leverage on the optimality of the SDP method whilst reducing its complexity.
Below we describe three heuristics, the LDH, AHC and MPSO that adopt the multistage aspect of the SDP method. The MPSO, however is the least reliable in terms of its accuracy compared to the LDH and AHC since its state space comprises of all possible predetermined users in each impression to stage allocation and their associated expected number of clicks. This is an essential characteristic of the SDP method, and LDH and AHC heuristics in attaining the optimal and near-optimal solution.

4 Heuristics

4.1 Ldh

We begin by introducing the LDH as a method which reduces the complexity of the SDP method by reducing its branching factor. For a given 2 or 3 stage problem, the LDH generates the impression-to-stage allocation [1, M-1] or [1, 1, M-2] respectively. Next, the optimal expected number of clicks is computed for this impression-to-stage allocation using equation and with users . Here is the optimal solution to the IM-RO problem in which , the node of the highest valency in the graph is selected at the first stage when . The inspiration for the LDH is based on the efficiency of well known high degree heuristics in [22] as well as the experimental findings of Section(3) in which the optimal solution was achieved in 3 stages. As the LDH expands only one node corresponding to either [1, M-1] or [1, 1, M-2], its complexity is , which is a drastic reduction to the complexity of the SDP method.

1:procedure LDH; Input : G=(V,E), number of impressions, , number of stages, , number of iterations,
2:     if   then return
3:         impression-to-stage allocation [1, M-1]
4:     else
5:         impression-to-stage-allocation [1, 1, M-2].      
6:     select the first node for this impression-to-stage allocation as the node with the highest degree, , compute the expected number of clicks generated.
7:     return solution
Algorithm 1 LDH

4.2 Ahc

The hill-climbing search algorithm often referred to as the greedy hill-climbing algorithm is an example of a local search algorithm that operates by expanding a single node and navigating to neighboring nodes with the goal of finding the global minimum/maximum, if one exists. The general hill- climbing algorithm and its variants have been proposed in  [45, 10]. Moreover, for the IM problem the greedy hill-climbing algorithm and improvements of this algorithm have been proposed in several papers  [22, 27, 30, 28, 6].
We implement an adaptive hill- climbing technique to the IM-RO problem with the functionality of the general hill-climbing algorithm, however the algorithm expands nodes corresponding to the impression-to-stage allocation [1,M-1] or [1, M-2, 1] for a given 2 or 3 stage problem respectively. The first node to expand in the th stage-to-go is chosen randomly. Based on the click outcomes, the probabilities over the entire network are updated using either the NIM or GIM and the expected number of clicks computed as in the SDP method. For the AHC algorithm, each time a node is randomly chosen in the stage to go and the expected number of clicks computed for the allocation using equation (1), its value is compared to the previous value computed. The AHC algorithm continues randomly expanding nodes in the stage- to-go and computing their associated optimal expected number of clicks for a specified number of iterations . In general, the hill-climbing algorithm does not guarantee the optimal solution, however has an memory and is quite efficient. We provide the hill climbing algorithm adapted to the IM-RO problem as follows:

1:procedure AHC; Input : G=(V,E), number of impressions,, number of stages, , number of iterations,
2:     if   then return
3:         impression-to-stage allocation [1, M-1]
4:     else
5:         impression-to-stage-allocation [1, M-2,1].      
6:     for  iteration  do
7:         select randomly the first user for this impression-to-stage allocation and obtain the expected number of clicks generated.
8:         if current solution previous solution  then return
9:              current solution
10:         else
11:              return previous solution               
Algorithm 2 AHC

4.3 Mpso

Particle Swarm Optimization (PSO), was first proposed as one of the swarm intelligence algorithms for optimizing continuous nonlinear functions in [12]. PSO is an algorithm that is modeled on the social behavior of swarming observed in insects, fishes and birds [24]. The main idea of PSO originated from the movement of bird flocks, in which the algorithm can find the optimal solution in the search space just like a flock of bird searching for its food. For the original continuous space PSO algorithm proposed in [12], the particles cooperated with each other in a global optimum and -dimensional search space in order to move to better positions.. The position vector is used to denote the current solution of particle whilst the velocity vector is used to provide the direction of the particle and adjust the particle’s position to the optimal solution. Various researchers have extended the original PSO algorithm proposed in [12] to discrete optimization problems  [9, 46, 42, 43]. The first of this kind was the binary particle swarm (BPSO) proposed in [23]. Similar to the continuous space PSO algorithm, the discrete space PSO algorithm involves the following probability update rules:

(4)
(5)

The particle maintains both a position and velocity over iterations given by and respectively, where is the vector representing the personal best solution of the particle and , the global best solution obtained by the entire swarm. and are parameters which weigh each particles own experience and the the entire swarm respectively whilst, , are constants such that, , [0,1]. At each iteration, the particle’s velocity is updated by using its own search experience and the experience of the entire swarm as it flies to a new search position.
For the implementation of the MPSO algorithm, the state space comprises of the set of possible predetermined users in each impression to stage allocation with their corresponding expected number of clicks. We modify and make use of a key concept called a Swap Operator proposed in [46] to handle discrete type PSO problems. For the implementation of the MPSO algorithm, a solution set can be described as a specific impression to stage allocation in which all of the users are identified. We define a Swap operator as intechanging user i with the user in the -th position, as addding user to the the th position in the stage to go and as removing user to and from the position in the stage to go. Using the these swap operators we can redefine addition on the solution sets with a new solution . That is,

(6)
(7)
(8)

A swap sequence , is a sequence made up of one or more of the following Swap Operators as defined in equations (6, (7 and 8). We redefine subtraction, on two solutions and as the Swap Sequence acting on the solution in order to obtain solution .
For example, consider a SDP formulation of the IM-RO problem involving 4 impressions and 2 stages, with two solutions and : =[2, 2] with users 1,2 in the first stage and 3,5 in the second stage.

= [1,3] with users 5 in the first stages and 2,3,1 in the second stage.

We can apply the Swap Operator to removing user 2 from the first position to obtain a new solution = [1,2] with user 5 in the first stage and use 3,1 in the second stage. The second Swap Operator can be applied to where user 2 is added to position 2 in order to obtain a new solution = with user 5,2 in the first stage and users 3, 1 in the second stage. The third swap operator is applied to and interchanges the user in position 1 with user 1. Thus = [2,2] with user 1,2 in the first stage and 3,5 in the second stage. Hence, a swap sequence with the least number of operators for is = . In implementing the MPSO, the velocity is updated using equation and applying the relevant swap sequences . We provide an algorithm, Algorithm for the procedure as follows:

1:procedure MPSO; Input: G= (V, E), swarm size, , number of iterations , number of stages ,
2:     for particle to  do
3:         initialize position
4:         initialize PBest
5:         initialize velocity 0      
6:     Based on click values, select the global best GBest
7:     for  iteration  do
8:         Update velocity .
9:         Update position .
10:         Update PBest and select GBest in this iteration.
11:         Update GBest as the best position found so far.      
12:     return Gbest and as the best position (solution) to the IM-RO problem.
Algorithm 3 MPSO

5 Experiments

We evaluated the effectiveness of the proposed heuristics using synthetic and real-world OSNs.

5.1 Datasets

We employed various synthetic networks and two real-world OSNs represented as graphs to analyze each method. Synthetic networks of various sizes 10, 50,100, 500, 1000, 2000, 4000, 4500 and 5200 were generated using a pseudo random number generator as done in [33]. From a sample of 10 generated synthetic graphs, the average node degree was found to be at least 60 % of the number of nodes in the graphs.
In addition to these networks, we utilized two real- world OSNs Flickr and Epinions obtained from the Social Computing Data Repository in [48] and the Stanford Network Analysis Platform in [31] respectively. The OSN, Flickr is is an image hosting and video sharing website where users can share images among each other. In this network "1,2" is used to represent the friendship relationship between the user id 1 and the user id 2. The entire dataset consists of 80,513 nodes, from this we extracted two datasets, FL1 comprising of 11,098 nodes and FL2, comprising of 20,217 nodes each with an average node degree of 2 nodes for the purpose of evaluating each heuristics.
Epinions is a customer review OSN in which users rate various products that are purchased on Ebay. The entire dataset consists of 75,879 nodes, from which, we extracted a dataset of 4,382 nodes with an average node degree of 3 nodes and refer to this dataset as Ep.

5.2 Experimental Settings

Influence models for the IM problem can be described as models which capture real-world propagations or the spread of information among users within a network. In addition to the diffusion models; the Linear Threshold and Independent Cascade models defined in [22], influence models that determine node and edge probabilities have been proposed in  [11, 40, 13, 16, 4, 5]. For the IM-RO problem we introduce the GIM equation (2) and NIM (3) as the pertinent influence models by which probabilities are updated at the end of each stage. The SDP method for the problem adopts a multistage approach and at each stage users are provided with advertising links or impressions. At the end of each stage, the outcomes or whether a user has clicked or not are determined and this information is utilized in the influence models to update the probabilities for future stages. The objective thus becomes to determine the number of impressions to be at placed at each stage and the users to place impressions to, so as to maximize the number of purchases. A user clicking on an impression is equated to a user purchasing a product, therefore optimizing the revenue generated is identical to optimizing the expected number of clicks. A user’s initial probability of clicking, was arbitrarily set to be for these experiments, and were also arbitrarily set to be 0.25. However for future work, we will demonstrate that can be effectively estimated using data mining techniques.
All our experimentation was undertaken on a server with 8GB of RAM and i3 Processor. The SDP method, LDH, AHC and MPSO heuristics were implemented from scratch using a Python version 2.7 (64 bit) with an average of 10 runs taken for each experiment.

5.3 Performance Analysis on Synthetic Networks

Influence Model Graph Size Iteration (n) Optimal Clicks Time (secs)
GIM 50 1 1.675 5
5 1.688 10
10 1.727 22
20 1.718 38
50 1.821 78
500 1 1.673 17
5 1.673 98
10 1.673 280
20 1.673 360
50 1.709 981
100 1.721 2,348
2000 1 1.673 231
5 1.673 1,237
10 1.673 2,257
20 1.71 3,615
4500 1 1.672 1,218
5 1.672 7215
10 1.672 14,427
NIM 50 1 1.260 5
5 1.261 8
10 1.262 15
20 1.262 25
50 1.263 53
500 1 1.251 23
5 1.251 85
10 1.251 187
20 1.251 327
50 1.251 877
100 1.252 2,060
2000 1 1.250 215
5 1.250 1,311
10 1.250 2,276
20 1.250 3,616
4500 1 1.250 1,198
5 1.250 7,228
10 1.250 18,031
Table 4: Results for 5 impressions and 3 stages with AHC
Datasets
500
1000
2000
4500
5200
Table 5: 3 stages, with LDH under GIM
Graph Size Swarm Size Iteration Optimal Clicks Time (secs)
50 10 1 1.260 5
10 1.261 50
20 1.261 63
40 1.261 117
80 1.261 261
100 1.261 275
50 1 1.262 417
10 1.262 154
20 1.262 294
40 1.262 535
80 1.262 1335
100 1 1.262 63
10 1.262 284
20 1.262 608
40 1.262 1,062
80 1.262 2,176
500 10 1 1.251 114
10 1.251 357
20 1.251 697
40 1.251 1,706
80 1.251 3,606
20 1 1.251 179
10 1.251 973
20 1.251 2,221
40 1.251 3,617
80 1.251 7,440
50 1 1.251 467
10 1.251 2,258
20 1.251 4,920
40 1.251 7,257
2000 10 1 1.250 1,998
10 1.250 10,838
Table 6: Results for 5 impressions in 2 for MPSO under NIM

The results indicated in Table (5) convey the optimal expected number of clicks and running times of the LDH under the GIM. As shown in Table (5), the LDH is orders of magnitude faster than the SDP method achieving a “good" solution of 1.45 in less than an hour on a synthetic graph of 5200 nodes. For now, we can think of a “good" solution as a solution that is at least as high as the value obtained by placing all the impressions in one stage, however, for future work we will obtain an upper bound on the optimal solution, as this will provide greater insights into reasonable solutions and how well these heuristics perform on large graphs. We note that the optimal expected number of clicks determined by the SDP method on a graph of 10 nodes was found to be 1.91 under identical model parameters of this experiment. We further note that an increase in the value of , even when is assigned small values , results in an increase in the expected number of clicks. Hence we propose the LDH method as a reasonable and promising method which leverages on the accuracy of the SDP method whilst reducing its complexity.
To evaluate the performance of the AHC algorithm, we varied the graph sizes and number of iterations. The results in Table (4) indicate that the optimal expected number of clicks increases with the number of iterations, particularly for large values of , that is, as seen in the graphs of 50 and 500 nodes. The AHC algorithm generates a value of 1.250 for a graph of 2000 nodes in less than 50 iterations and 1.252 for a graph for a graph of 500 nodes in 100 iterations, both values in less than an hour. Under the GIM, the AHC generates higher values as high as 1.821 for a graph of 50 nodes in 50 iterations. For a network of size 2000 nodes, the AHC generates 1.71 clicks in approximately one hour. However for a graph of 4,500 nodes notably under NIM, the AHC proves to be unfeasible taking 5 hours to generate 1.250 clicks in 10 iterations. Hence we consider the AHC method as a moderately efficient method for obtaining near optimal solutions to the IM-RO problem. Taking into consideration (1) increasing the number of iterations increases the optimal expected number of clicks and running times and (2) utilizing ideal influence model parameters can generate higher optimal expected click values.
Table provides and analysis for the MPSO method on the IM-RO problem with both and set to 0.5. In particular, we note the effect of increasing the swarm size, and the number of iterations on the optimal expected number of clicks. For a network of 50 users with , and less than 10 iterations, the MPSO method generates “good" results in minutes under the NIM. However, for larger graphs, (i.e greater than 500), the MPSO converges slowly taking hours to converge to less accurate solutions. This is primarily due to the fact that its running times increases significantly with its swarm size and number of iterations. From the analysis in Table(6), we can conclude that the MPSO method is a fairly reasonable algorithm in terms of achieving near-optimal solutions, however its running time is too slow making it unfeasible for large graphs. Moreover, it is unreliable in terms of accuracy since its state space consists of a set of optimal expected click values for predetermined users in impression-to-stage allocations.

Figure 3: NIM with 5 impressions in 2 stages,
Figure 4: GIM with 5 impressions in 2 stages,

We observe Figure (3) and Figure and note the effect of varying on all three methods. The results indicate that the LDH and AHC generate identical optimal expected number of click values on various synthetic networks. We consider Figure when and highlight the significant increase in the optimal expected number of clicks from 1.0 to 1.5. These results have considerable gains for any OSN advertiser and significant implications for the choice of influence models and the effect of optimizing influence model parameters in maximizing the expected number of clicks. Another reason for the similarity in performance of the LDH and AHC algorithms can be attributed to the similarity in the synthetic networks each being generated by the same random number generator. We note that the MPSO algorithm generates the highest expected number of clicks for all graph sizes however its running time is too slow for large graphs, this result is further supported in our scalability analysis.

5.4 Scalability

To evaluate the scalability, the sizes of synthetic networks were doubled from 250, 500, 1000,…, up to 4000 nodes.

Figure 5: Regular scale
Figure 6: Log-Log Scale

Figures (6) and (6) demonstrate the results of the running times of the LDH, AHC and MPSO methods on a regular scale and log-log scale respectively. From the result in Fig (6), we can clearly deduce that the PSO algorithm is not scalable since its running times is in the hour range for 2000 nodes making it unfeasible to run on larger graphs. We also consider the high degree of the graphs generated by the psuedo random number generator allowing them to be suitable indicators for relatively any dataset. Figure (6) provides a further differentiation between the algorithms. From this results we conclude that all three algorithms have similar slopes, however the LDH and AHC has both a good slope and intercept making them suitable for large graphs with at least thousands of nodes and edges.

5.5 Performance on real- world OSNs

We compare the computational time and optimal expected number of clicks generated by the LDH and AHC heuristics on two real-world OSNs under the GIM with model parameters , and 10 iterations. Table and Figure indicate that the performance of the LDH is considerably better or at least as good as the AHC heuristic in terms of the optimal number of clicks generated on the Epinions dataset whilst the AHC generates significantly higher optimal expected click values on the Flickr dataset. We attribute these results to the design of the LDH being more suited to the structure of the OSN Epinions and less to the structure of Flickr. Indeed, while both the LDH and the AHC heuristic achieve near optimal solutions in a run time of under 30 minutes even for a network of 20,21 users, the LDH attains the optimal values in seconds for all three networks. In general, the AHC is well suited to both Epinions and Flickr OSNs in terms of its accuracy and running times. For a problem involving 5 impressions, the optimal expected number of clicks generated is at least 2. Although the LDH generates similar results for a problem involving 5 impressions on the Epinions dataset, the optimal expected number of clicks generated from the Flickr dataset is 1 even when there is an increase the number of stages.

Method OSN Impressions Stages Optimal Clicks Time (secs)
LDH Ep,4,382 5 2 1.56 4.4
10 2 3 4.7
50 2 13 4.7
100 2 25.5 4.9
200 2 50.5 5.5
5 3 2.01 30
Fl1,11,098 5 2 1 12.6
10 2 2.25 10
50 2 12.25 10
100 2 24.75 10
200 2 49.75 10.
5 3 1 85
Fl2,20,217 5 2 1 15
10 2 2.25 15
50 2 12.25 16
100 2 24.75 15
200 2 49.75 16
5 3 1 83
AHC Ep,4,382 5 2 1 20
10 2 3.3 33
50 2 13 389
100 2 25.5 40
200 2 50.6 42
5 3 2.1 45
Fl1,11,098 5 2 1.5 56
10 2 2.8 65
50 2 12.9 65
100 2 25.3 71
200 2 50.3 97
5 3 1.9 603
Fl2,20,217 5 2 1.55 163
10 2 3.25 1701
50 2 13.13 168
100 2 26.2 160
200 2 51.6 157
5 3 2 1,298
Table 7: Results on real-world OSN
Figure 7: Real World Datasets

The LDH and AHC heuristic exhibit good performance and are orders of magnitude faster than the SDP method. The results for these heuristics suggest that advertising companies can target the optimal users to market (or spread information) to in OSNs in a way that can generate predictable and lucrative gains for socio-economic advancement.

6 Conclusion

We provide a novel approach to influence maximization which until now has been primarily used in resource allocation and shortest path problems. We divert from previous approaches to influence maximization based on the theory of submodular functions and adopt a novel and practical decision-making approach geared towards maximizing clicks an revenue among users of an OSN. Hence we redefine the problem as IM-RO and introduce SDP as the method in which this problem can be solved. We first reviewed the properties of the SDP method on small synthetic networks and highlight the lucrative advantages that our method poses to advertising companies in terms of generating revenue and optimizing clicks. Due to the complexity of the SDP method, we sought to obtain heuristics which achieved near optimal solutions in considerably less time.
We second, proposed three heuristics, the LDH, AHC and MPSO algorithms which exploited the multistage attribute of the SDP method whilst reducing its complexity. In addition to achieving near-optimal solutions, all three methods were found to be orders of magnitude faster than the SDP method. We provided a scalability analysis and evaluated our proposed heuristics on synthetic networks of various sizes and two real-world OSNs, Flickr and Epinions. The LDH and AHC are shown to be well-suited heuristics for the SDP method in terms of their accuracy, scalability and running times. The AHC is a more efficient heuristic than the LDH since it outperforms the LDH in terms of accuracy and running times for the two real-world OSNs.
We confirmed that the GIM exceeded the NIM in generating optimal expected number of click with approximately the same computational times. It was shown that increasing within both influence models significantly increased the optimal expected number of clicks even when remained small eg. . This result provides substantial implications for the potential gains in obtaining ideal influence models and optimizing their associated model parameters.
Our immediate future work is to provide an extensive analysis on our influence models and how their parameters affect the IM-RO problem. It is also necessary to obtain accurate estimates of the influence model parameters through statistical and data mining techniques in order to improve on the optimality of the expected number of clicks.
As a immediate consequence of approaching the IM problem through a decision-making perspective there are multiple directions for future work, both in terms of optimization (approximate dynamic programming methods) and data science. The results presented provide an evaluation for our methods on large networks. It is also necessary to derive an upper bound on the objective function in order to determine how well our methods perform on these large networks. Another direction for future work related to influence maximization is to obtain the influence spread for the IM-RO problem where the influence spread is defined as a function on the number of stages of the problem. Our future work also includes exploring this applications in fields of healthcare, communication, epidemiology, education, and agriculture.

References

  • [1] Abbassi, Z., Bhaskara, A., Misra, V.: Optimizing display advertising in online social networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee.1-11 (2015).
  • [2] Bertsekas, D. P., & Tsitsiklis, J. N., "An Analysis of Stochastic Shortest Path Problems", Mathematics of Operations Research. 16, 580-595 (1991)
  • [3] Bhagat, S., Goyal, A., Lakshmanan, L.: Maximizing product adoption in social networks. In Proceedings of the 5th ACM International Conference on Web search and Data Mining. ACM. 603-612 (2012).
  • [4] Cao T, Wu X., Hu T.X., Wang S.: Active learning of model parameters for influence maximization. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 280- 295 (2011).
  • [5] Chakrabarti, P., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, ACM. 307-318 (1998)
  • [6] Chen, W., Wang, W. Y., Yang, S.: Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.199-208 (2009).
  • [7] Chen, W., Wang, C., Wang, Y. : Scalable influence maximization for prevalent viral marketing in large scale social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 1029–1038 (2010)
  • [8] Chen, W., Collins, A., Cummings, R.: Influence maximization in social networks when negative opinions may emerge and propagate. SIAM SDM, 11: 379-390 (2011)
  • [9] Clerc, M.: Discrete particle Swarm Optimization, in New Optimization Techniques in Engineering. New York, Springer-Verlag (2004).
  • [10] Davis, L.D.: Bit-climbing, representational bias, and test suite design. In R. K. Belew and L. B. Booker (eds.), Proceedings of the Fourth International Conference on Genetic Algorithms, pp 18-23. CA: Morgan Kaufmann, San Mateo (1991)
  • [11] Domingos, P. & Richardson, M. : Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 57-66 (2001)
  • [12] Eberhart, R.C. , Kennedy,J. A new optimizer using particle swarm theory.In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1, 39–43 (1995)
  • [13] Galhotra S., Arora A., Shourya R.: Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models. In Proceedings of the 2016 International Conference on Management of Data. SIGMOD. 743- 758 (2016)
  • [14] Galstyan, A., Musoyan, V., Cohen, P.: Maximizing influence propagation in networks with community structure. Phys. Rev. E. 79(5), (2009)
  • [15] Gomez-Rodriguez M., Leskovec J., Krause A. Inferring networks of diffusion and influence. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 1019-1028. (2010)
  • [16] Goyal, A., Bonchi, M., & Lakshmanan, L.: Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining. 241-250. (2010)
  • [17] Goyal, A., Bonchi, F., Lakshmanan, L.: A data-base approach to social influence maximization. In Proceedings of the 38th international conference of the VLDM Endowment. ACM. 73-84. (2011)
  • [18] Granovetter, M. Threshold models of collective behavior. The American Journal of Sociology (6), 1420–1443. (1978)
  • [19] Hosein, P., Lawrence, T.: Stochastic dynamic model for revenue optimization in social networks. In Proceedings of the 11th International Conference On Wireless and Mobile Computing, Networking and Communications.IEEE. 378-383 (2015)
  • [20] Hosseini-Pozveh, M., Zamanifar, K., Naghsh-Nilchi, A., Dolog, P., Maximizing the spread of positive influence in signed social networks. Intelligent Data Analysis. 20(1) 199-218 (2006)
  • [21] Jackson, M. & Yariv, L.: Diffusion on social networks. Economie Publique. 16 69-82. (2005).
  • [22] Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 137-146 (2003).
  • [23] Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In Proceedings of IEEE International Conference on Computational Cybernetics and Simulation. IEEE. 5, 4104-4108 (1997).
  • [24] Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, (2004)
  • [25] Kimura, M., Saito, K. Motod. H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. (2008).
  • [26] Kimura, M., Saito, K.: Approximate solutions for the influence maximization problem in a social network. Knowledge-Based Intelligent Information and Engineering Systems. LNCS. 4252: 937-44 (2006)
  • [27] Kimura, M., Saito, K., Motod, H.: Minimizing the Spread of Contamination by Blocking links in a network Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. (2008).
  • [28] Liu, B., Cong, G., Xu, D.: Time constrained influence maximization in social networks. In Proceedings of the 12th IEEE International Conference on Data Mining. IEEE. 439-48. (2012).
  • [29] Levi, R., Roundy, R., Shmoys, D.B. Provably near-optimal sampling-based policies for stochastic inventory control models. Math. Oper. Res. 32 821-839. (2007)
  • [30] Leskovec, J., Krause, K. ,Geustrin, C.: Cost effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 420-429. (2007).
  • [31] Leskovec, J. & Krevl, A.: SNAP Datasets:Stanford Large Network Dataset Collection. (2014)
    http://snap.stanford.edu/data
  • [32] Macy, M. & Willer, R.: From Factors to Actors: Computational Sociology and Agent-Based Modeling. Ann. Rev. Soc. (2002)
  • [33] Matsumoto, M. & Nishimura, T. “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”. ACM . Transactions on Modeling and Computer Simulation. 8. 3-30 (1998)
  • [34] Meerman, D.S. Viral Marketing: Let the world tell your story for free [online], Pragmatic Marketing, Available from: <http://www.pragmatic marketing.com/publications/magazine/5/5/viral-marketing-let-the- worldtell-your-story-for-free>,(2008)Accessed 10 Oct 2017
  • [35] Morone, F. & Makse, H.: Influence maximization in complex networks through optimal percolation.Nature. 524, 65-68 (2015)
  • [36] Narayanam, R., Narahari, Y.:Determining the top-k nodes in social networks using the shapely value. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM. 1509-1512. (2008).
  • [37] Nascimento, J. & Powell, W. :An Optimal Approximate Dynamic Programming Algorithm for the Economic Dispatch Problem with Grid-Level Storage, IEEE Transactions on Automatic Control (2013)
  • [38] Nemhauser, G.L & Wolsey, L.A. : An Analysis Of Approximations For Maximizing Submodular Set Functions-I. Mathematical Programming. 14 265-294 (1978).
  • [39] Powell, W. B. : Exploration Versus Exploitation, in Approximate Dynamic Programming: Solving the Curses of Dimensionality, Second Edition, John Wiley & Sons, Inc., Hoboken, NJ, USA. (2011) doi: 10.1002/9781118029176.ch12
  • [40] Richardson, M. & Domingos, R. : Mining knowledge sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM. 61-70 (2002)
  • [41] Saito, K., Nakano, R., Kimura, M. : Prediction of information diffusion probabilities for independent cascade model. Knowledge- Based Intelligent and Engineering Systems. 67-75 (2008).
  • [42] Salman, A., Ahmad, I., Al-Mahadi, S.: Particle Swarm Optimization for task assignment problems. Microprocessor microsystems. 26, 363-371 (2002)
  • [43] Sha, D.Y. & Hsu, C. : A hybrid particle swarm optimization for job scheduling problem. Computers and Industrial Engineering. 51, 791–808 (2006)
  • [44] Singer, Y. : How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for 470 Social Networks. Fifth ACM Int Conf Web Search Data Min. 1-10 (2012)
  • [45] Tsang, E. & C. Voudouris, C.: Fast local search and guided local search and their application to british telecom’s workforce scheduling. Technical Report CSM-246, Department of Computer Science, University of Essex, Colchester, UK, (1995)
  • [46] Wang, Y., Cong, G., Song, G.: Community-based greedy algorithm for mining to k influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. DOI: 10.1145/1835804. 1835935. (2010)
  • [47] Wu, Hao-Hsiang & Kucukyavuz, S.: Maximizing Influence in Social Networks: A Two-Stage Stochastic Programming Approach That Exploits Submodularity. Department of Integrated Systems Engineering, The Ohio State University, Columbus, OH. (2016).
  • [48] Zafarani, R. & Liu, H. Social Computing Data Repository at ASU (2009) [http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State University, School of Computing, Informatics and Decision Systems Engineering
  • [49] Facebook Reports Fourth Quarter and Full Year 2016 Results. https://s21.q4cdn.com/399680738/files/doc_financials/2016/Q4/Facebook-Reports-Fourth-Quarter-and-Full-Year-2016-Results.pdf(2016). Accessed 3 January 2017
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
119936
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description