1 Introduction

\DoubleSpacedXI\TheoremsNumberedThrough\ECRepeatTheorems\EquationsNumberedThrough\RUNAUTHOR\RUNTITLE\TITLE

Online Learning and Optimization Under a New Linear-Threshold Model with Negative Influence

\ARTICLEAUTHORS\AUTHOR

Shuoguang Yang, Shatian Wang, Van-Anh Truong \AFFDepartment of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027, \EMAILsy2614@columbia.edu, \EMAILsw3219@columbia.edu, \EMAILvt2196@columbia.edu

\ABSTRACT

We propose a new class of Linear Threshold Model-based information-diffusion model that incorporates the formation and spread of negative attitude. We call such models negativity-aware. We show that in these models, the expected positive influence is a monotone submodular function of the seed set. Thus we can use a greedy algorithm to construct a solution with constant approximation guarantee when the objective is to select a seed set of fixed size to maximize positive influence. Our models are flexible enough to account for both the features of local users and the features of the information being propagated in the diffusion.

We analyze an online-learning setting for a multi-round influence-maximization problem, where an agent is actively learning the diffusion parameters over time while trying to maximize total cumulative positive influence. We assume that in each diffusion step, the agent can only observe whether a node becomes positively or negatively influenced, or remains inactive. In particular, he does not observe the particular edge that brought about the activation of a node, if any, as in the case of most models that assume Independent Cascade (IC)-based diffusions. This model of feedback is called node-level feedback, as opposed to the more common edge-level feedback model in which he is able to observe, for each node, the edge through which that node is influenced. Under mild assumptions, we develop online learning algorithms that achieve cumulative expected regrets of order for any where is the total number of rounds. These are the first regret guarantees for node-level feedback models of influence maximization of any kind. Furthermore, with mild assumptions, this result also improves the average regret of for the edge-level feedback model in wen2017online, thus providing a new performance benchmark.

1 Introduction

As online social networks become increasingly integrated into our daily life, popular platforms such as Facebook, Twitter, and YouTube have turned into an important and effective media for advertising products and spreading ideas. Commercially, it has become routine for brands to use the word-of-the-mouth effect to promote products on social networks. In other spheres, politicians, activists, and even ordinary people can leverage these networks to instigate political and social changes. Given the immense power of social networks in spreading information and ideas, it is not uncommon to see social network marketing campaigns backfire. Even when a campaign is carefully designed, negative reactions might still arise due to the controversial nature of the information being propagated. Therefore, it is necessary to consider frameworks that allow the formation and spread of negative attitude, whose likelihood depends on heterogeneous demographics.

Motivated by the potential emergence of negative attitudes in social networks, we consider a negativity-aware multi-round influence maximization problem. In our problem, an agent, hoping to promote certain information, conducts a marketing campaign over a time horizon – for example, three months. The time horizon is further divided into rounds, such as one-week periods. At the beginning of each round, the agent selects a fixed-cardinality seed set of users, called influencers, in the network. These users initiate a cascade of information spread through the network. The agent then closely monitors the subsequent influence diffusion process in the social network. The rounds are independent and the round rewards are cumulative. The agent is aware of the potential emergence of negative reactions and possible negative influence during the diffusion process, but is initially unaware of the underlying parameters that govern the attitude diffusion. Her goal is to simultaneously perform two actions: first, to learn the parameters via the feedback she gathers during monitoring; second, to select the seed set in each round in order to maximize the total expected number of positively influenced users over all rounds. Our problem is relevant to the (Online) Influence Maximization literature. While most existing works model only positive influence, the works that do consider the spread of negative attitude are either not flexible enough to capture important real-world characteristics or are intractable due to a lack of desirable mathematical properties. Also, to the best of our knowledge, there is no influence maximization framework that captures both online learning and the potential spread of negative attitude.

In this paper, we propose a novel class of Linear Threshold-based information-diffusion model that incorporates the formation and spread of negative attitude. We call such models negativity-aware. We show that in these models, the expected positive influence function is monotone submodular. Thus we can use the greedy algorithm to construct seed sets of fixed sizes with constant approximation guarantees, when the objective is to maximize expected positive influence. Our models are flexible enough to account for both the features of local users and the features of the information being propagated in the diffusion.

Next, we analyze an online-learning setting for a multi-round influence-maximization problem, where an agent is actively learning the diffusion parameters over time while trying to maximize total cumulative positive influence. We assume that in each diffusion step, the agent observes whether a node becomes positively or negatively influenced, or remains inactive. This assumption reflects the reality that network activity is typically restricted to two measurable observations: first, that while we are able to identify an activated user, we are not able the observe the specific contributions of his neighbours; second, that we are able to observe the time of activation. For example, on Twitter, assume Charlie is a follower of both Andrew and Bob. If Andrew and Bob both retweet a story, and Charlie further retweets that story, we cannot determine whether Andrew’s influence on Charlie was stronger than Bob’s. However, if Andrew and Bob tweeted on Monday, Charlie tweeted on Tuesday, and David tweeted on Wednesday, we would know that David’s tweet could not have influenced Charlie’s.

To the best of our knowledge, we are the first to propose linear-threshold-based negativity-aware diffusion models that have monotone submodular objectives. Currently, only independent-cascade-based negativity-aware models are known with these properties. We develop online learning and influence maximization algorithms for our models. Specifically, under mild stability assumptions, we develop online learning algorithms that achieve cumulative expected regrets in the order of one over the number of rounds to the power of any constant smaller than one. These are the first regret guarantees for node-level feedback models for influence maximization of any kind.

The rest of the paper is organized as follows: in Section 2, we give a review of the classical information-diffusion models and the influence maximization problem in the online-learning setting. We also summarize existing works on negativity-aware variants of these models. In Section 3, we introduce our LT-based negativity-aware diffusion models. We prove monotonicity and submodularity properties for our models in Section 4. In Section 5 and LABEL:sec:ON-LTN, we introduce an online-learning version of our problem. We propose an online-learning algorithm and benchmark its performance against an algorithm that has access to the exact diffusion parameters.

2 Literature Review

Researchers have proposed various diffusion models for information spread and have extensively explored ways to maximize the spread of influence in these models. In their seminal work, kempe2003maximizing proposes the so-called Influence Maximization (IM) problem. In IM, a social network is modeled as a directed graph where each node in the node set represents a user and a directed edges in the edge set indicates that information can spread form user to . They consider a viral marketing problem on the graph , where a decision maker seeks to identify an optimal set of seed users to initiate an influence diffusion process, so that the expected number of people eventually influenced by information diffusion is maximized. They put forward two diffusion models, the Independent Cascade Model (IC) and the Linear Threshold Model (LT). We will describe these models briefly.

In the IC model, each edge has an associated weight, which is denoted as . This weight measures the likelihood with which user successfully influences user . We use to represent a function from to that maps each edge to its corresponding weight. We refer to the function as weights. IC specifies an influence diffusion process in discrete time steps. Initially, all nodes are inactive. In step , a seed set of users is selected and activated. In each subsequent step , each user activated in step has a single chance to activate her inactive downstream neighbors, independently with success probabilities equal to the corresponding edge weights. This process terminates when no more users can be activated. The set of users activated during the IC process is precisely the set of users who have been influenced by the information.

The LT model, on the other hand, focuses more on describing the combined effect of neighbours in influencing a node. In this model, each edge is still associated with a weight . Again we use to denote a function from to that maps each edge to its corresponding weight and refer to the function as weights. It is also assumed that the sum of the incoming edge weights for each node is at most one. That is, . The LT diffusion process also unfolds in discrete time steps. In step , all nodes in the seed set becomes activated, and each non-seed node independently samples a threshold , i.e., uniformly from . In each subsequent step , for each inactive node , if

then becomes activated. This process terminates after step if no nodes change their activation status in this step.

Given a diffusion model, let denote the expected number of nodes activated during the diffusion process given the seed set and diffusion parameters . We say that is monotone if for any , . If for any and , , then we say that is submodular. kempe2003maximizing has shown that it is NP-hard to find with respect to either the IC or LT model. However, has been proved to be both monotone and submodular with respect to the two diffusion models. As a result, a greedy-based algorithm can find a seed set such that (Nemhauser1978). Due to the nice properties of monotonicity and submodularity, IC and LT have become the bases for many more complex diffusion models that were later developed.

2.1 Negativity-aware diffusion models

The existing models for influence diffusion primarily focus on the spread of one attitude of influence, which we can consider as positive influence for simplicity. More precisely, whenever a user is influenced during the information diffusion process, she adopts a positive attitude towards the information being spread. However, in practice, we cannot guarantee such a uniformity in attitude, especially when the message being promoted is controversial in nature.

A few authors were motivated to consider potential negative reactions and the spread of negative attitudes (chen2011influence, neg1, neg2, neg3, neg4). They propose new negativity-aware models that allow a node to become either positively or negatively influenced. In these models, the basic influence maximization problem is to identify a seed set of size that maximizes the number of positively influenced nodes.

chen2011influence propose the first negativity-aware model. In addition to the influence probabilities , they assume that there is a quality factor representing the quality of the product being promoted. While the activation process follows that of IC, once a node is chosen as a seed node or is activated by a positive upstream neighbor, it becomes positive with probability and negative with probability , independently of everything else. Meanwhile, if the node is influenced by a negative upstream neighbor, it becomes negative with certainty. Let us denote the expected final number of positively influenced nodes as . For a fixed , it is shown by chen2011influence that is monotone and submodular. In addition, they show that if for all , then given a seed set , the probability that a node turns positive is , where is the length of a directed shortest-path from to in . Their model has a strong negativity bias. Any node activated by a negative upstream neighbor can only turn negative. In reality however, when the information being propagated is controversial, a person might be influenced by her friends’ strong attitudes to look into the issue, but can develop a different attitude towards it. Another limitation of this model is that cannot be a function of individual nodes, reflecting users’ individual attitudes. It must be a uniform constant. Otherwise, the influence function turns out to be no longer monotone or submodular. In Section LABEL:apd:IC-N-break of the appendices, we provide an example in which the greedy algorithm can have an arbitrarily bad approximation ratio when the quality factors are heterogeneous.

neg1’s model is richer as there are now four different types of users, i.e., (dis)satisfied and (non-)complainers. Each type has a different but fixed probability of participating in the negative word-of-mouth. However, the model is still not flexible enough to account for the richness of user characteristics. Other more refined models are generally intractable (neg2, neg4, neg3). neg2 introduces opinion indicator of each user that takes value from . They propose a two-phase Linear Threshold-based model in which users’ opinion indicators are updated according to the incoming influence from activated friends. Although it models nodes’ attitudes using real numbers, the influence maximization problem with respect to the proposed model is NP-hard to approximate with any constant approximation ratios. neg4 further improve upon neg2’s model by including both opinion indicators and interaction, i.e., how information is perceived between two nodes. Their model is compatible with both LT and IC. However, the corresponding influence maximization problem is still NP-hard to approximate with any constant approximation ratios. neg3 proposes a continuous time diffusion model using Poisson process. Each node has its own attitude score that falls between 0 and 1. They also consider negative seed set as well as a counter-response positive seed set. However, the model is too involved for theoretical analysis, and thus only empirical evaluations were conducted.

To the best of our knowledge, we are the first to propose negativity-aware diffusion models that are not only flexible enough to incorporate a variety of individual user characteristics but also have monotone submodular objective functions. We allow users with different characteristics to have different information-sharing behaviors and attitude-formation patterns. Due to monotonicity and submodularity of the objective functions, we can use greedy algorithm to obtain a -approximate solution, where is an error term that is caused by the (typical) use of simulation to evaluate the influence function.

2.2 Online learning for influence maximization

There is another line of work that focuses on the online-learning setting for influence maximization under the IC model (OIM, IMB, CUCB, contextualIM, DISBIM, wen2017online). In this setting, an agent starts with zero knowledge of the edge weights, and has rounds to advertise a product. In each round, it can select a seed set of up to nodes based on information observed in previous rounds, called feedback. The goal is to maximize the total expected influence spread over all rounds.

Two feedback mechanisms have been proposed. Under the edge-semi-bandit feedback, the agent can be observed for each activated node whether its attempts to activate its downstream neighbors succeeded or not. On the other hand, under the node-level feedback, only the identity of the newly activated nodes in each diffusion step can be observed. More precisely, when a node is activated in step and more than one of its upstream neighbors were activated in step , it is not possible to discern which of these upstream neighbors activated . For IC-based diffusions, both node-level feedback and edge-level feedback can be assumed. LT-based models, on the other hand, assumes joint effort of active parent nodes in activating a given child node. As a result, for LT-based diffusions, the node-level feedback mechanism should be the only natural setup.

We are the first to provide an explicit regret guarantee for online learning under node-level feedback for an influence maximization problem of any kind. To date, the edge-semi bandit feedback setting has been well-characterized by various authors (CUCB, wen2017online), but not the node-level feedback setting. IMB uses Maximum Likelihood Estimation-based techniques to learn from node-level feedback, but do not provide regret guarantees for their MLE-based learning algorithm.

3 Negativity-aware diffusion model

In this section, we introduce a new negativity-aware diffusion model based on the Linear Threshold model, which we refer to as the Negativity-Aware Linear Threshold (LT-N) model.

In LT-N, each node can be in one of the three possible states at any time: positive, negative, and inactive. Positive (resp. negative) means that the node holds a positive (resp. negative) attitudes towards the information being propagated. Meanwhile, inactivate means the node has not yet developed any attitude towards the information, due to, for example, lack of awareness. Let denote being positive, negative or inactive, respectively. We assume that, initially, all nodes are in the inactive state. In other words, for all .

A person’s attitude is not only determined by her friends’ but also by her own experience and value judgment. To incorporate such personal bias, we introduce two autonomy factors associated with each node , such that . The autonomy factors for each node depend on the information being promoted, as well as on the node’s unique characteristics. In other words, (resp. ) is the weight that places on her own attitudes in responding to the information. The belief score

measures the amount of trust that the node places on her own judgment. Intuitively, the smaller is, the more susceptible is to others’ attitudes. For now, we assume that and are both known.

A person also tends to place different weights on different friends’ influences. We model this by having a weight associated with each edge . The larger is, the more influential ’s attitudes is on . We assume that for each node , the sum of weights of its incoming edges lies between 0 and 1. More precisely, let be the set of in-neighbors of , we assume that

During the LT-N diffusion process, we assume that positive and negative influences from friends, rather than cancelling each other out, jointly prompt each person to take note of the information being diffused. Intuitively, the fact that a piece of information triggers different reactions by people around us should further pique our interests to learn about it, and eventually to develop our own attitude toward it. Subsequently, in our model, a node is activated the first time the sum of weights from its active neighbours exceeds a threshold. After being activated, the node decides on its attitude (positive or negative) based on the ratio between the positive influence (sum of weights from positively activated friends) and negative influence (sum of weights from negatively activated friends) that are exerted on the node most recently as well as the the node’s own belief scores.

Mathematically, the LT-N diffusion process unfolds in discrete time steps as follows, starting from seed nodes in the chosen seed set . (We reserve “round” for online learning).

  • Each node independently samples a threshold .

  • In step , all nodes are inactive. Set .

  • In step , all seed nodes become positive, and all non-seed nodes are inactive. Set and .

  • In general, let (resp. , ) denote the set of nodes that are activated (resp. positive, negative) by the end of time step . In each subsequent time step , for each inactive node , if

    then becomes active. It turns positive with probability

    (1)

    and negative otherwise. Note that the probability of the node turning positive or negative is a convex combination of its own belief and the most recent influences from its active neighbours.

  • The process terminates when no more inactive node can be activated. Let (resp. , ) denote the set of active (resp. positive, negative) nodes at the end of the process, which runs until (at most) .

Note that a node must become either positive or negative once activated. Meanwhile, as in the original LT diffusion model, the nodes that are activated in the current time step does not affect other nodes in the same time step.

4 Influence Maximization

Under the LT-N model, we consider the problem of choosing at most seed nodes to maximize the number of positive nodes at the end of the diffusion process. More rigorously, let be the expected number of positive nodes at the end of the diffusion process respectively under LT-N. Our goal is to maximize , subject to the cardinality constraint that for some positive integer .

This problem is an extension of the influence maximization under the original Linear Threshold model, which is NP-hard (kempe2003maximizing). In Theorem 4 below, we prove that is monotone submodular under LT-N. Nemhauser1978 shows that when the set function one wants to maximize is monotone and submodular, then the greedy algorithm guarantees a -approximation. Therefore, it follows that greedy is a -approximation algorithm for our problem. {theorem}Let be the expected number of positive nodes at the end of the diffusion process under LT-N given seed set . Then, is monotone submodular.

Proof.

Proof sketch. We define another diffusion model that we call the negativity-aware triggering set model (TS-N). We first show that the expected positive influence spread function of TS-N is monotone submodular. We then show that the set of positively (negatively) activated nodes in each step of LT-N has the same distribution as that in TS-N. This way, we conclude that the expected positive influence function of LT-N is also monotone submodular. The details of the proof are included in Section LABEL:sec:LT-N-mon-sub of the appendices.    

5 Learning From Node-Level Feedback

The previous two sections are based on the assumption that the both edge weights and autonomy factors are known. In this section we consider an online learning setting of this problem. Namely, the autonomy factors and the edge weights are initially unknown and need to be gradually learned. We further assume that the autonomy factors and the edge weights can be linearly generalized. More specifically, we assume that there exist two unknown vectors and . For each , we have a known feature vector of edge , , such that . For each , we have two known feature vectors such that . With the linear generalizations, learning the autonomy factors and the weights amounts to learning the corresponding unknown vectors and .

Recall that the node activation process in our LT-N model follows the classical LT model. After a node is activated, the sign of the activation (positive or negative) depends on both the autonomy factors and the most recent influences from its active friends, as defined in (1):

where is the belief score of that was defined previously.

Our plan is to learn for the weight function from the node activation observations. For this part, we do not use the observed signs of the activation. Therefore, our result is suitable for the online learning setting with respect to the classical LT model. For this reason, we present the learning framework under the classical LT models in this section. As for learning for the autonomy factors with respect to LT-N, we use the signs of the observed node activation. In the next section, we extend the framework to account for the learning of .

Consider the classical LT models, in each round, the agent activates a seed set that initiates the information diffusion on the network. Unlike the edge-level feedback model assumed by most existing online influence maximization literature with IC as the underlying diffusion model, where the status of each edge that takes part in the diffusion can be observed, in our node-level feedback model, we assume that the agent can only observe the node status. More specifically, in each diffusion time step, the agent observes whether or not a node becomes positively or negatively activated or remains inactive, but she does not get to observe how each of its active parents contributes to this node’s activation. Since several edges may contribute to a node’s activation simultaneously, it is hard to discriminate the contribution of each individual edge and estimate the edge weights accurately. Because of this difficulty, online learning with node-level feedback has remained largely unexplored until this work.

In this section, we first mathematically formulate the online learning problem and discuss some assumptions we impose. Then we investigate the key obstacles in learning and propose an algorithm that performs weight estimation and selects seed sets in each round. Finally, we conduct theoretical analysis on the performance of the algorithm. Specifically, we first show that the average cumulative regret of Algorithm LABEL:alg:IM02 with hyper-parameter is bounded by , where is the total number of rounds. This improves upon the average cumulative regret of obtained by wen2017online for an online influence maximization problem with edge-level feedback. It is worth noting that the same regret bounds could be achieved if we apply Algorithms LABEL:alg:IM02 to edge-level feedback problems.

5.1 Learning in classical LT model

wen2017online have investigated the performance of edge-level feedback IC model and purposed a UCB-type learning algorithm IMLinUCB. However, it is hard to extend their work to any node-level feedback model such as LT-N. The main challenge comes from parameter estimation. In round , IMLinUCB estimates using ridge regression with realizations of independent Bernoulli trials on the edges observed so far. Denote the ridge regression estimate in round by . IMLinUCB constructs a confidence ball around , and derives the upper confidence bound (UCB) weight for every edge . IMLinUCB then selects the seed set by feeding , to a greedy approximation oracle.

Intuitively, with more observations, the upper bound converges to for each edge , thus making the selected seed set an -approximation solution to the optimum.

Unfortunately, the structural similarities between their IM problem and the classical linear contextual bandit problem, and between the IMLinUCB and the classical algorithms in abbasi2011improved do not hold anymore under node-level feedback. As several edges can simultaneously contribute to the activation of a single node, it is generally not possible to estimate the weight and upper bound on each individual edge accurately. Consider a simple example as illustrated in Figure 1. We have two edges and with corresponding features and . Suppose that the true weights on and are and and these two edges are always observed simultaneously. In this case, with more observations, the estimation of converges to . However, if we try to estimate and separately, as these two edges are always observed together, we have and . This example shows that the upper confidence bound of each individual edges does not necessarily converge to its true weight even if we have infinitely many observations of this edge.

Figure 1: The case where weights and upper bounds cannot be estimated accurately by node-level observations.

The example above indicates that there is no quick extension of IMLinUCB for LT-N. Thus, further assumptions as well as more sophisticated algorithms are required to ensure an increasingly accurate edge weight estimation as more node-level realizations are observed.

Technical assumptions

Recall that in each round , the nodes are activated in discrete time steps according to our LT-N model, with nodes in the seed set being activated in time step of round . For each node , we use to denote the time step at which node becomes activated in round . When , we have that . If is not activated in round , then we set .

For each node , define its relevant parents as follows:

That is, the set of relevant parents is the set of nodes that are relevant to the activation status of in round . We say the weight on an edge is active if has been activated. When , is the set of its parent nodes who collectively push the sum of active incoming weights at to exceed ’s threshold for the first time. When , is the set of nodes that have collectively failed to push the sum of active incoming weights at to exceed its threshold. Note that for an inactive node in round , might not be empty, since some of its parent nodes might be activated during the diffusion process but have failed to activate .

Our analysis is based on a few assumptions on the weights and solution stability, which we will state and justify below.

We first introduce assumptions on the edge weights. The first assumption is a linear generalization of edge weights. We assume that each edge has an edge feature vector that characterizes the relationship between ’s two end nodes. The weight on each edge is a linear mapping from its feature. More formally, we have {assumption}[Linear parameterization] There exists (), such that the true edge weights are . By the assumption on the incoming weight sum of our LT model, we have for all . Such a linear generalization of diffusion parameters is also used in wen2017online. The generalization makes our learning model more scalable.

We use to denote a generic vector in and refer to it as the parameter. We denote the true parameter as . Similar to the assumption for LT-N, we assumes for all . Furthermore, we assume the that the “aggregated” features are bounded too: {assumption}[Feature regularity] For all and all , . Note that Assumption 5.1.1 is similar to the feature boundedness assumption in many existing works on contextual linear bandit problems. For example, wen2017online assumes that the norms of the edge features are bounded. Similar assumptions are also made by abbasi2011improved and chu2011contextual. In addition, the LT-N model, like any other LT model, requires the sum of weights of incoming edges of every node to be bounded by . It is thus natural to assume that the norm of the sum of any subset of incoming features at every node is bounded by 1, which can always be achieved by an appropriate scaling of feature space.

One of our key ideas is to make sure that the features of observed edges are diverse enough to allow enough information to be collected on all directions of , so that as . More specifically, we impose a feature diversity assumption as follows: {assumption}[Feature diversity] There exists edges , such that the matrix is positive definite. In other words, the minimum eigenvalue is strictly greater than . It is easy to see that the existence of edges with linearly independent features would be sufficient to ensure Assumption 5.1.1. This should be easy to satisfy as the dimension of the features is usually much smaller than the total number of edges, that is, . Under this assumption, if we keep exploring those edges, the confidence region will shrink in all feature directions, so that as .

Let be the maximum seed set cardinality. Let be the expected total influence given seed set and edge weights . Denote and . As discussed in Section 4, the objective function is monotone and submodular so that a greedy algorithm with exact evaluation of returns a -approximation solution. Since evaluating is -P hard, we assume access to an approximation oracle: {assumption}[Approximation oracle] Let and , there exists an efficient, possibly random -oracle that takes , , , and outputs a seed set such that with probability at least . An example of is . The reverse reachable set method in tang2014influence can be easily extended to obtain such an approximation oracle.

We also impose assumptions on the solution stability of the network. Consider the solution returned by the approximation oracle around . Define the -approximation seed sets with respect to as . That is, is the set of seed sets whose expected influences with respect to edge weights are at least times that of an optimal seed set. Our next assumption states that the set of -approximation sets and the optimal seed set are invariant under small perturbations. {assumption}[Stability] There exists a constant such that for any that satisfies , we have . Moreover, there exists a seed set such that for all such that . The above assumption will be satisfied under mild conditions. In Lemma 5.1.1 below, we provide a sufficient condition for Assumption 5.1.1 to hold, and show that this stability condition holds with probability one. {lemma} Let be the true parameter, be the set of -approximation sets with respect to , and be the optimal value with respect to . Assumption 5.1.1 holds whenever . This sufficient condition holds with probability 1 if we sample uniformly from an interval with .

The proof of Lemma 5.1.1 is in Section LABEL:apd:online-lemmas-thm:2 of the appendices. It shows that Assumption 5.1.1 fails only when there is a set that provides exactly an -approximation, which happens with probability zero.

The stability assumption is crucial for analyzing the theoretical performance of our algorithm. Suppose the current estimator is close enough to such that , and the greedy algorithm successfully returns a size- -approximation solution to such that . By Assumption 5.1.1, we have , which implies is also an -approximate solution to , and it yields zero regret. Although Assumption 5.1.1 is general enough and provides a possibility of getting better theoretical guarantees for online-learning, it has not been exploited by any previous algorithm, to the best of our knowledge.

Performance metrics

One of the most important metric in evaluating the performance of online learning algorithms is the average regret. The average regret is the cumulative loss in reward divided by the horizon length . This cumulative loss is incurred due to the inaccurate estimation of edge weights and the random nature of the -approximation oracle invoked. It is worth noting that the loss from a random oracle cannot be reduced even if the true edge weights were known.

To analyze the performance of our online learning algorithms, we adopt the average scaled regret proposed in wen2017online. In particular, let be the optimal size- seed set with respect to the true parameter , and be the seed set selected at round . We consider the metric , where is the total number of rounds in a finite horizon, and . When , reduces to the standard expected average regret .

Algorithms

Under the assumptions introduced above, we propose online learning algorithms to learn the true parameter and select seed sets effectively.

Let be the exploration set consisting diverse edges satisfying Assumption 5.1.1. We partition the time horizon into multiple epochs, each having a number of exploration and exploitation rounds depending on a hyper-parameter . Specifically, the -th epoch consists of exploration and subsequent exploitation rounds. More generally, is the index of the first round of epoch ; and are the series of exploration and exploitation rounds in -th epoch, respectively.

We say that the activation status of a node is observed in round if any of its parent node is activated in the diffusion of round , and unobserved otherwise. In round for any node whose activation status is observed, denote

(2)

as the feature for the combined edge weights that are relevant to the activation status of , and let

(3)

Use to denote the set of observed nodes in round .

In epoch , our first algorithm runs as follows:

  • The algorithm first runs exploration rounds. In the -th exploration round, which has index , it selects a single seed node for . For each node whose activation status is observed in the current round, we construct and as outlined in (2) and (3). For node , we construct and .

  • After the exploration rounds in epoch , construct the least squares estimate using the observed node activation status as follows:

    Let the covariance matrix and corresponding reward be

    (4)

    Define

  • The algorithm then runs exploitation rounds. At the beginning of exploitation round , using the observed node activation status in the first rounds

    invoke the -oracle on with parameters to obtain the seed set .

Note that in the presentation of the algorithm above, we use only the node observations from exploration rounds to update our belief for . We do so to simplify the notation in later analysis. In practice, one can use observations from all rounds, including exploitation rounds, to update the belief on . The theoretical analysis remains the same, but the notation becomes more convoluted. The complete algorithm is summarized in Algorithm LABEL:alg:IM02.

It is worth noting that to ensure the full observability of the exploration edge , we only select a single seed node in each exploration round. This way, no other edges will be involved in the attempt to activate in the first diffusion step of each exploration round. In practice, instead of selecting only to make sure edge ’s realization is observed, we can select together with a set of seed nodes that are not connected to .

Note also that the number of exploration rounds is fixed to be while the number of exploitation rounds in the -th epoch is . Thus, the ratio between number of exploration and exploitation time decreases as the number of epoch increases. Intuitively, each exploration round incurs regret. As the estimation gets closer to the true parameter , we can gradually decrease the number of exploration rounds to reduce the contribution of exploration to the total regret. At the same time, insufficient exploration could make inaccurate, which might lead to sub-optimal seed selection and increased the total regret. Thus, a balance of exploration and exploitation is required to achieve minimum total regret.

In the rest of this section, we provide a theoretical analysis on Algorithm LABEL:alg:IM02 and derive an average per-round regret of where is the total number of rounds, based on all assumptions made above.

5.2 Regret analysis

{theorem}

Assume Assumptions 5.1.1 5.1.1, 5.1.1, 5.1.1, 5.1.1 hold. The average regret of running Algorithm LABEL:alg:IM02 in rounds is

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
408873
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description