Online Learning and Optimization Under a New LinearThreshold Model with Negative Influence
Shuoguang Yang, Shatian Wang, VanAnh Truong \AFFDepartment of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027, \EMAILsy2614@columbia.edu, \EMAILsw3219@columbia.edu, \EMAILvt2196@columbia.edu
We propose a new class of Linear Threshold Modelbased informationdiffusion model that incorporates the formation and spread of negative attitude. We call such models negativityaware. We show that in these models, the expected positive influence is a monotone submodular function of the seed set. Thus we can use a greedy algorithm to construct a solution with constant approximation guarantee when the objective is to select a seed set of fixed size to maximize positive influence. Our models are flexible enough to account for both the features of local users and the features of the information being propagated in the diffusion.
We analyze an onlinelearning setting for a multiround influencemaximization problem, where an agent is actively learning the diffusion parameters over time while trying to maximize total cumulative positive influence. We assume that in each diffusion step, the agent can only observe whether a node becomes positively or negatively influenced, or remains inactive. In particular, he does not observe the particular edge that brought about the activation of a node, if any, as in the case of most models that assume Independent Cascade (IC)based diffusions. This model of feedback is called nodelevel feedback, as opposed to the more common edgelevel feedback model in which he is able to observe, for each node, the edge through which that node is influenced. Under mild assumptions, we develop online learning algorithms that achieve cumulative expected regrets of order for any where is the total number of rounds. These are the first regret guarantees for nodelevel feedback models of influence maximization of any kind. Furthermore, with mild assumptions, this result also improves the average regret of for the edgelevel feedback model in wen2017online, thus providing a new performance benchmark.
1 Introduction
As online social networks become increasingly integrated into our daily life, popular platforms such as Facebook, Twitter, and YouTube have turned into an important and effective media for advertising products and spreading ideas. Commercially, it has become routine for brands to use the wordofthemouth effect to promote products on social networks. In other spheres, politicians, activists, and even ordinary people can leverage these networks to instigate political and social changes. Given the immense power of social networks in spreading information and ideas, it is not uncommon to see social network marketing campaigns backfire. Even when a campaign is carefully designed, negative reactions might still arise due to the controversial nature of the information being propagated. Therefore, it is necessary to consider frameworks that allow the formation and spread of negative attitude, whose likelihood depends on heterogeneous demographics.
Motivated by the potential emergence of negative attitudes in social networks, we consider a negativityaware multiround influence maximization problem. In our problem, an agent, hoping to promote certain information, conducts a marketing campaign over a time horizon – for example, three months. The time horizon is further divided into rounds, such as oneweek periods. At the beginning of each round, the agent selects a fixedcardinality seed set of users, called influencers, in the network. These users initiate a cascade of information spread through the network. The agent then closely monitors the subsequent influence diffusion process in the social network. The rounds are independent and the round rewards are cumulative. The agent is aware of the potential emergence of negative reactions and possible negative influence during the diffusion process, but is initially unaware of the underlying parameters that govern the attitude diffusion. Her goal is to simultaneously perform two actions: first, to learn the parameters via the feedback she gathers during monitoring; second, to select the seed set in each round in order to maximize the total expected number of positively influenced users over all rounds. Our problem is relevant to the (Online) Influence Maximization literature. While most existing works model only positive influence, the works that do consider the spread of negative attitude are either not flexible enough to capture important realworld characteristics or are intractable due to a lack of desirable mathematical properties. Also, to the best of our knowledge, there is no influence maximization framework that captures both online learning and the potential spread of negative attitude.
In this paper, we propose a novel class of Linear Thresholdbased informationdiffusion model that incorporates the formation and spread of negative attitude. We call such models negativityaware. We show that in these models, the expected positive influence function is monotone submodular. Thus we can use the greedy algorithm to construct seed sets of fixed sizes with constant approximation guarantees, when the objective is to maximize expected positive influence. Our models are flexible enough to account for both the features of local users and the features of the information being propagated in the diffusion.
Next, we analyze an onlinelearning setting for a multiround influencemaximization problem, where an agent is actively learning the diffusion parameters over time while trying to maximize total cumulative positive influence. We assume that in each diffusion step, the agent observes whether a node becomes positively or negatively influenced, or remains inactive. This assumption reflects the reality that network activity is typically restricted to two measurable observations: first, that while we are able to identify an activated user, we are not able the observe the specific contributions of his neighbours; second, that we are able to observe the time of activation. For example, on Twitter, assume Charlie is a follower of both Andrew and Bob. If Andrew and Bob both retweet a story, and Charlie further retweets that story, we cannot determine whether Andrew’s influence on Charlie was stronger than Bob’s. However, if Andrew and Bob tweeted on Monday, Charlie tweeted on Tuesday, and David tweeted on Wednesday, we would know that David’s tweet could not have influenced Charlie’s.
To the best of our knowledge, we are the first to propose linearthresholdbased negativityaware diffusion models that have monotone submodular objectives. Currently, only independentcascadebased negativityaware models are known with these properties. We develop online learning and influence maximization algorithms for our models. Specifically, under mild stability assumptions, we develop online learning algorithms that achieve cumulative expected regrets in the order of one over the number of rounds to the power of any constant smaller than one. These are the first regret guarantees for nodelevel feedback models for influence maximization of any kind.
The rest of the paper is organized as follows: in Section 2, we give a review of the classical informationdiffusion models and the influence maximization problem in the onlinelearning setting. We also summarize existing works on negativityaware variants of these models. In Section 3, we introduce our LTbased negativityaware diffusion models. We prove monotonicity and submodularity properties for our models in Section 4. In Section 5 and LABEL:sec:ONLTN, we introduce an onlinelearning version of our problem. We propose an onlinelearning algorithm and benchmark its performance against an algorithm that has access to the exact diffusion parameters.
2 Literature Review
Researchers have proposed various diffusion models for information spread and have extensively explored ways to maximize the spread of influence in these models. In their seminal work, kempe2003maximizing proposes the socalled Influence Maximization (IM) problem. In IM, a social network is modeled as a directed graph where each node in the node set represents a user and a directed edges in the edge set indicates that information can spread form user to . They consider a viral marketing problem on the graph , where a decision maker seeks to identify an optimal set of seed users to initiate an influence diffusion process, so that the expected number of people eventually influenced by information diffusion is maximized. They put forward two diffusion models, the Independent Cascade Model (IC) and the Linear Threshold Model (LT). We will describe these models briefly.
In the IC model, each edge has an associated weight, which is denoted as . This weight measures the likelihood with which user successfully influences user . We use to represent a function from to that maps each edge to its corresponding weight. We refer to the function as weights. IC specifies an influence diffusion process in discrete time steps. Initially, all nodes are inactive. In step , a seed set of users is selected and activated. In each subsequent step , each user activated in step has a single chance to activate her inactive downstream neighbors, independently with success probabilities equal to the corresponding edge weights. This process terminates when no more users can be activated. The set of users activated during the IC process is precisely the set of users who have been influenced by the information.
The LT model, on the other hand, focuses more on describing the combined effect of neighbours in influencing a node. In this model, each edge is still associated with a weight . Again we use to denote a function from to that maps each edge to its corresponding weight and refer to the function as weights. It is also assumed that the sum of the incoming edge weights for each node is at most one. That is, . The LT diffusion process also unfolds in discrete time steps. In step , all nodes in the seed set becomes activated, and each nonseed node independently samples a threshold , i.e., uniformly from . In each subsequent step , for each inactive node , if
then becomes activated. This process terminates after step if no nodes change their activation status in this step.
Given a diffusion model, let denote the expected number of nodes activated during the diffusion process given the seed set and diffusion parameters . We say that is monotone if for any , . If for any and , , then we say that is submodular. kempe2003maximizing has shown that it is NPhard to find with respect to either the IC or LT model. However, has been proved to be both monotone and submodular with respect to the two diffusion models. As a result, a greedybased algorithm can find a seed set such that (Nemhauser1978). Due to the nice properties of monotonicity and submodularity, IC and LT have become the bases for many more complex diffusion models that were later developed.
2.1 Negativityaware diffusion models
The existing models for influence diffusion primarily focus on the spread of one attitude of influence, which we can consider as positive influence for simplicity. More precisely, whenever a user is influenced during the information diffusion process, she adopts a positive attitude towards the information being spread. However, in practice, we cannot guarantee such a uniformity in attitude, especially when the message being promoted is controversial in nature.
A few authors were motivated to consider potential negative reactions and the spread of negative attitudes (chen2011influence, neg1, neg2, neg3, neg4). They propose new negativityaware models that allow a node to become either positively or negatively influenced. In these models, the basic influence maximization problem is to identify a seed set of size that maximizes the number of positively influenced nodes.
chen2011influence propose the first negativityaware model. In addition to the influence probabilities , they assume that there is a quality factor representing the quality of the product being promoted. While the activation process follows that of IC, once a node is chosen as a seed node or is activated by a positive upstream neighbor, it becomes positive with probability and negative with probability , independently of everything else. Meanwhile, if the node is influenced by a negative upstream neighbor, it becomes negative with certainty. Let us denote the expected final number of positively influenced nodes as . For a fixed , it is shown by chen2011influence that is monotone and submodular. In addition, they show that if for all , then given a seed set , the probability that a node turns positive is , where is the length of a directed shortestpath from to in . Their model has a strong negativity bias. Any node activated by a negative upstream neighbor can only turn negative. In reality however, when the information being propagated is controversial, a person might be influenced by her friends’ strong attitudes to look into the issue, but can develop a different attitude towards it. Another limitation of this model is that cannot be a function of individual nodes, reflecting users’ individual attitudes. It must be a uniform constant. Otherwise, the influence function turns out to be no longer monotone or submodular. In Section LABEL:apd:ICNbreak of the appendices, we provide an example in which the greedy algorithm can have an arbitrarily bad approximation ratio when the quality factors are heterogeneous.
neg1’s model is richer as there are now four different types of users, i.e., (dis)satisfied and (non)complainers. Each type has a different but fixed probability of participating in the negative wordofmouth. However, the model is still not flexible enough to account for the richness of user characteristics. Other more refined models are generally intractable (neg2, neg4, neg3). neg2 introduces opinion indicator of each user that takes value from . They propose a twophase Linear Thresholdbased model in which users’ opinion indicators are updated according to the incoming influence from activated friends. Although it models nodes’ attitudes using real numbers, the influence maximization problem with respect to the proposed model is NPhard to approximate with any constant approximation ratios. neg4 further improve upon neg2’s model by including both opinion indicators and interaction, i.e., how information is perceived between two nodes. Their model is compatible with both LT and IC. However, the corresponding influence maximization problem is still NPhard to approximate with any constant approximation ratios. neg3 proposes a continuous time diffusion model using Poisson process. Each node has its own attitude score that falls between 0 and 1. They also consider negative seed set as well as a counterresponse positive seed set. However, the model is too involved for theoretical analysis, and thus only empirical evaluations were conducted.
To the best of our knowledge, we are the first to propose negativityaware diffusion models that are not only flexible enough to incorporate a variety of individual user characteristics but also have monotone submodular objective functions. We allow users with different characteristics to have different informationsharing behaviors and attitudeformation patterns. Due to monotonicity and submodularity of the objective functions, we can use greedy algorithm to obtain a approximate solution, where is an error term that is caused by the (typical) use of simulation to evaluate the influence function.
2.2 Online learning for influence maximization
There is another line of work that focuses on the onlinelearning setting for influence maximization under the IC model (OIM, IMB, CUCB, contextualIM, DISBIM, wen2017online). In this setting, an agent starts with zero knowledge of the edge weights, and has rounds to advertise a product. In each round, it can select a seed set of up to nodes based on information observed in previous rounds, called feedback. The goal is to maximize the total expected influence spread over all rounds.
Two feedback mechanisms have been proposed. Under the edgesemibandit feedback, the agent can be observed for each activated node whether its attempts to activate its downstream neighbors succeeded or not. On the other hand, under the nodelevel feedback, only the identity of the newly activated nodes in each diffusion step can be observed. More precisely, when a node is activated in step and more than one of its upstream neighbors were activated in step , it is not possible to discern which of these upstream neighbors activated . For ICbased diffusions, both nodelevel feedback and edgelevel feedback can be assumed. LTbased models, on the other hand, assumes joint effort of active parent nodes in activating a given child node. As a result, for LTbased diffusions, the nodelevel feedback mechanism should be the only natural setup.
We are the first to provide an explicit regret guarantee for online learning under nodelevel feedback for an influence maximization problem of any kind. To date, the edgesemi bandit feedback setting has been wellcharacterized by various authors (CUCB, wen2017online), but not the nodelevel feedback setting. IMB uses Maximum Likelihood Estimationbased techniques to learn from nodelevel feedback, but do not provide regret guarantees for their MLEbased learning algorithm.
3 Negativityaware diffusion model
In this section, we introduce a new negativityaware diffusion model based on the Linear Threshold model, which we refer to as the NegativityAware Linear Threshold (LTN) model.
In LTN, each node can be in one of the three possible states at any time: positive, negative, and inactive. Positive (resp. negative) means that the node holds a positive (resp. negative) attitudes towards the information being propagated. Meanwhile, inactivate means the node has not yet developed any attitude towards the information, due to, for example, lack of awareness. Let denote being positive, negative or inactive, respectively. We assume that, initially, all nodes are in the inactive state. In other words, for all .
A person’s attitude is not only determined by her friends’ but also by her own experience and value judgment. To incorporate such personal bias, we introduce two autonomy factors associated with each node , such that . The autonomy factors for each node depend on the information being promoted, as well as on the node’s unique characteristics. In other words, (resp. ) is the weight that places on her own attitudes in responding to the information. The belief score
measures the amount of trust that the node places on her own judgment. Intuitively, the smaller is, the more susceptible is to others’ attitudes. For now, we assume that and are both known.
A person also tends to place different weights on different friends’ influences. We model this by having a weight associated with each edge . The larger is, the more influential ’s attitudes is on . We assume that for each node , the sum of weights of its incoming edges lies between 0 and 1. More precisely, let be the set of inneighbors of , we assume that
During the LTN diffusion process, we assume that positive and negative influences from friends, rather than cancelling each other out, jointly prompt each person to take note of the information being diffused. Intuitively, the fact that a piece of information triggers different reactions by people around us should further pique our interests to learn about it, and eventually to develop our own attitude toward it. Subsequently, in our model, a node is activated the first time the sum of weights from its active neighbours exceeds a threshold. After being activated, the node decides on its attitude (positive or negative) based on the ratio between the positive influence (sum of weights from positively activated friends) and negative influence (sum of weights from negatively activated friends) that are exerted on the node most recently as well as the the node’s own belief scores.
Mathematically, the LTN diffusion process unfolds in discrete time steps as follows, starting from seed nodes in the chosen seed set . (We reserve “round” for online learning).

Each node independently samples a threshold .

In step , all nodes are inactive. Set .

In step , all seed nodes become positive, and all nonseed nodes are inactive. Set and .

In general, let (resp. , ) denote the set of nodes that are activated (resp. positive, negative) by the end of time step . In each subsequent time step , for each inactive node , if
then becomes active. It turns positive with probability
(1) and negative otherwise. Note that the probability of the node turning positive or negative is a convex combination of its own belief and the most recent influences from its active neighbours.

The process terminates when no more inactive node can be activated. Let (resp. , ) denote the set of active (resp. positive, negative) nodes at the end of the process, which runs until (at most) .
Note that a node must become either positive or negative once activated. Meanwhile, as in the original LT diffusion model, the nodes that are activated in the current time step does not affect other nodes in the same time step.
4 Influence Maximization
Under the LTN model, we consider the problem of choosing at most seed nodes to maximize the number of positive nodes at the end of the diffusion process. More rigorously, let be the expected number of positive nodes at the end of the diffusion process respectively under LTN. Our goal is to maximize , subject to the cardinality constraint that for some positive integer .
This problem is an extension of the influence maximization under the original Linear Threshold model, which is NPhard (kempe2003maximizing). In Theorem 4 below, we prove that is monotone submodular under LTN. Nemhauser1978 shows that when the set function one wants to maximize is monotone and submodular, then the greedy algorithm guarantees a approximation. Therefore, it follows that greedy is a approximation algorithm for our problem. {theorem}Let be the expected number of positive nodes at the end of the diffusion process under LTN given seed set . Then, is monotone submodular.
Proof.
Proof sketch. We define another diffusion model that we call the negativityaware triggering set model (TSN). We first show that the expected positive influence spread function of TSN is monotone submodular. We then show that the set of positively (negatively) activated nodes in each step of LTN has the same distribution as that in TSN. This way, we conclude that the expected positive influence function of LTN is also monotone submodular. The details of the proof are included in Section LABEL:sec:LTNmonsub of the appendices.
5 Learning From NodeLevel Feedback
The previous two sections are based on the assumption that the both edge weights and autonomy factors are known. In this section we consider an online learning setting of this problem. Namely, the autonomy factors and the edge weights are initially unknown and need to be gradually learned. We further assume that the autonomy factors and the edge weights can be linearly generalized. More specifically, we assume that there exist two unknown vectors and . For each , we have a known feature vector of edge , , such that . For each , we have two known feature vectors such that . With the linear generalizations, learning the autonomy factors and the weights amounts to learning the corresponding unknown vectors and .
Recall that the node activation process in our LTN model follows the classical LT model. After a node is activated, the sign of the activation (positive or negative) depends on both the autonomy factors and the most recent influences from its active friends, as defined in (1):
where is the belief score of that was defined previously.
Our plan is to learn for the weight function from the node activation observations. For this part, we do not use the observed signs of the activation. Therefore, our result is suitable for the online learning setting with respect to the classical LT model. For this reason, we present the learning framework under the classical LT models in this section. As for learning for the autonomy factors with respect to LTN, we use the signs of the observed node activation. In the next section, we extend the framework to account for the learning of .
Consider the classical LT models, in each round, the agent activates a seed set that initiates the information diffusion on the network. Unlike the edgelevel feedback model assumed by most existing online influence maximization literature with IC as the underlying diffusion model, where the status of each edge that takes part in the diffusion can be observed, in our nodelevel feedback model, we assume that the agent can only observe the node status. More specifically, in each diffusion time step, the agent observes whether or not a node becomes positively or negatively activated or remains inactive, but she does not get to observe how each of its active parents contributes to this node’s activation. Since several edges may contribute to a node’s activation simultaneously, it is hard to discriminate the contribution of each individual edge and estimate the edge weights accurately. Because of this difficulty, online learning with nodelevel feedback has remained largely unexplored until this work.
In this section, we first mathematically formulate the online learning problem and discuss some assumptions we impose. Then we investigate the key obstacles in learning and propose an algorithm that performs weight estimation and selects seed sets in each round. Finally, we conduct theoretical analysis on the performance of the algorithm. Specifically, we first show that the average cumulative regret of Algorithm LABEL:alg:IM02 with hyperparameter is bounded by , where is the total number of rounds. This improves upon the average cumulative regret of obtained by wen2017online for an online influence maximization problem with edgelevel feedback. It is worth noting that the same regret bounds could be achieved if we apply Algorithms LABEL:alg:IM02 to edgelevel feedback problems.
5.1 Learning in classical LT model
wen2017online have investigated the performance of edgelevel feedback IC model and purposed a UCBtype learning algorithm IMLinUCB. However, it is hard to extend their work to any nodelevel feedback model such as LTN. The main challenge comes from parameter estimation. In round , IMLinUCB estimates using ridge regression with realizations of independent Bernoulli trials on the edges observed so far. Denote the ridge regression estimate in round by . IMLinUCB constructs a confidence ball around , and derives the upper confidence bound (UCB) weight for every edge . IMLinUCB then selects the seed set by feeding , to a greedy approximation oracle.
Intuitively, with more observations, the upper bound converges to for each edge , thus making the selected seed set an approximation solution to the optimum.
Unfortunately, the structural similarities between their IM problem and the classical linear contextual bandit problem, and between the IMLinUCB and the classical algorithms in abbasi2011improved do not hold anymore under nodelevel feedback. As several edges can simultaneously contribute to the activation of a single node, it is generally not possible to estimate the weight and upper bound on each individual edge accurately. Consider a simple example as illustrated in Figure 1. We have two edges and with corresponding features and . Suppose that the true weights on and are and and these two edges are always observed simultaneously. In this case, with more observations, the estimation of converges to . However, if we try to estimate and separately, as these two edges are always observed together, we have and . This example shows that the upper confidence bound of each individual edges does not necessarily converge to its true weight even if we have infinitely many observations of this edge.
The example above indicates that there is no quick extension of IMLinUCB for LTN. Thus, further assumptions as well as more sophisticated algorithms are required to ensure an increasingly accurate edge weight estimation as more nodelevel realizations are observed.
Technical assumptions
Recall that in each round , the nodes are activated in discrete time steps according to our LTN model, with nodes in the seed set being activated in time step of round . For each node , we use to denote the time step at which node becomes activated in round . When , we have that . If is not activated in round , then we set .
For each node , define its relevant parents as follows:
That is, the set of relevant parents is the set of nodes that are relevant to the activation status of in round . We say the weight on an edge is active if has been activated. When , is the set of its parent nodes who collectively push the sum of active incoming weights at to exceed ’s threshold for the first time. When , is the set of nodes that have collectively failed to push the sum of active incoming weights at to exceed its threshold. Note that for an inactive node in round , might not be empty, since some of its parent nodes might be activated during the diffusion process but have failed to activate .
Our analysis is based on a few assumptions on the weights and solution stability, which we will state and justify below.
We first introduce assumptions on the edge weights. The first assumption is a linear generalization of edge weights. We assume that each edge has an edge feature vector that characterizes the relationship between ’s two end nodes. The weight on each edge is a linear mapping from its feature. More formally, we have {assumption}[Linear parameterization] There exists (), such that the true edge weights are . By the assumption on the incoming weight sum of our LT model, we have for all . Such a linear generalization of diffusion parameters is also used in wen2017online. The generalization makes our learning model more scalable.
We use to denote a generic vector in and refer to it as the parameter. We denote the true parameter as . Similar to the assumption for LTN, we assumes for all . Furthermore, we assume the that the “aggregated” features are bounded too: {assumption}[Feature regularity] For all and all , . Note that Assumption 5.1.1 is similar to the feature boundedness assumption in many existing works on contextual linear bandit problems. For example, wen2017online assumes that the norms of the edge features are bounded. Similar assumptions are also made by abbasi2011improved and chu2011contextual. In addition, the LTN model, like any other LT model, requires the sum of weights of incoming edges of every node to be bounded by . It is thus natural to assume that the norm of the sum of any subset of incoming features at every node is bounded by 1, which can always be achieved by an appropriate scaling of feature space.
One of our key ideas is to make sure that the features of observed edges are diverse enough to allow enough information to be collected on all directions of , so that as . More specifically, we impose a feature diversity assumption as follows: {assumption}[Feature diversity] There exists edges , such that the matrix is positive definite. In other words, the minimum eigenvalue is strictly greater than . It is easy to see that the existence of edges with linearly independent features would be sufficient to ensure Assumption 5.1.1. This should be easy to satisfy as the dimension of the features is usually much smaller than the total number of edges, that is, . Under this assumption, if we keep exploring those edges, the confidence region will shrink in all feature directions, so that as .
Let be the maximum seed set cardinality. Let be the expected total influence given seed set and edge weights . Denote and . As discussed in Section 4, the objective function is monotone and submodular so that a greedy algorithm with exact evaluation of returns a approximation solution. Since evaluating is P hard, we assume access to an approximation oracle: {assumption}[Approximation oracle] Let and , there exists an efficient, possibly random oracle that takes , , , and outputs a seed set such that with probability at least . An example of is . The reverse reachable set method in tang2014influence can be easily extended to obtain such an approximation oracle.
We also impose assumptions on the solution stability of the network. Consider the solution returned by the approximation oracle around . Define the approximation seed sets with respect to as . That is, is the set of seed sets whose expected influences with respect to edge weights are at least times that of an optimal seed set. Our next assumption states that the set of approximation sets and the optimal seed set are invariant under small perturbations. {assumption}[Stability] There exists a constant such that for any that satisfies , we have . Moreover, there exists a seed set such that for all such that . The above assumption will be satisfied under mild conditions. In Lemma 5.1.1 below, we provide a sufficient condition for Assumption 5.1.1 to hold, and show that this stability condition holds with probability one. {lemma} Let be the true parameter, be the set of approximation sets with respect to , and be the optimal value with respect to . Assumption 5.1.1 holds whenever . This sufficient condition holds with probability 1 if we sample uniformly from an interval with .
The proof of Lemma 5.1.1 is in Section LABEL:apd:onlinelemmasthm:2 of the appendices. It shows that Assumption 5.1.1 fails only when there is a set that provides exactly an approximation, which happens with probability zero.
The stability assumption is crucial for analyzing the theoretical performance of our algorithm. Suppose the current estimator is close enough to such that , and the greedy algorithm successfully returns a size approximation solution to such that . By Assumption 5.1.1, we have , which implies is also an approximate solution to , and it yields zero regret. Although Assumption 5.1.1 is general enough and provides a possibility of getting better theoretical guarantees for onlinelearning, it has not been exploited by any previous algorithm, to the best of our knowledge.
Performance metrics
One of the most important metric in evaluating the performance of online learning algorithms is the average regret. The average regret is the cumulative loss in reward divided by the horizon length . This cumulative loss is incurred due to the inaccurate estimation of edge weights and the random nature of the approximation oracle invoked. It is worth noting that the loss from a random oracle cannot be reduced even if the true edge weights were known.
To analyze the performance of our online learning algorithms, we adopt the average scaled regret proposed in wen2017online. In particular, let be the optimal size seed set with respect to the true parameter , and be the seed set selected at round . We consider the metric , where is the total number of rounds in a finite horizon, and . When , reduces to the standard expected average regret .
Algorithms
Under the assumptions introduced above, we propose online learning algorithms to learn the true parameter and select seed sets effectively.
Let be the exploration set consisting diverse edges satisfying Assumption 5.1.1. We partition the time horizon into multiple epochs, each having a number of exploration and exploitation rounds depending on a hyperparameter . Specifically, the th epoch consists of exploration and subsequent exploitation rounds. More generally, is the index of the first round of epoch ; and are the series of exploration and exploitation rounds in th epoch, respectively.
We say that the activation status of a node is observed in round if any of its parent node is activated in the diffusion of round , and unobserved otherwise. In round for any node whose activation status is observed, denote
(2) 
as the feature for the combined edge weights that are relevant to the activation status of , and let
(3) 
Use to denote the set of observed nodes in round .
In epoch , our first algorithm runs as follows:

After the exploration rounds in epoch , construct the least squares estimate using the observed node activation status as follows:
Let the covariance matrix and corresponding reward be
(4) Define

The algorithm then runs exploitation rounds. At the beginning of exploitation round , using the observed node activation status in the first rounds
invoke the oracle on with parameters to obtain the seed set .
Note that in the presentation of the algorithm above, we use only the node observations from exploration rounds to update our belief for . We do so to simplify the notation in later analysis. In practice, one can use observations from all rounds, including exploitation rounds, to update the belief on . The theoretical analysis remains the same, but the notation becomes more convoluted. The complete algorithm is summarized in Algorithm LABEL:alg:IM02.
It is worth noting that to ensure the full observability of the exploration edge , we only select a single seed node in each exploration round. This way, no other edges will be involved in the attempt to activate in the first diffusion step of each exploration round. In practice, instead of selecting only to make sure edge ’s realization is observed, we can select together with a set of seed nodes that are not connected to .
Note also that the number of exploration rounds is fixed to be while the number of exploitation rounds in the th epoch is . Thus, the ratio between number of exploration and exploitation time decreases as the number of epoch increases. Intuitively, each exploration round incurs regret. As the estimation gets closer to the true parameter , we can gradually decrease the number of exploration rounds to reduce the contribution of exploration to the total regret. At the same time, insufficient exploration could make inaccurate, which might lead to suboptimal seed selection and increased the total regret. Thus, a balance of exploration and exploitation is required to achieve minimum total regret.
In the rest of this section, we provide a theoretical analysis on Algorithm LABEL:alg:IM02 and derive an average perround regret of where is the total number of rounds, based on all assumptions made above.
5.2 Regret analysis
{theorem}Assume Assumptions 5.1.1 5.1.1, 5.1.1, 5.1.1, 5.1.1 hold. The average regret of running Algorithm LABEL:alg:IM02 in rounds is