A Worst-Case Approximation Ratio versus True Worst-Case

# Robust Budget Allocation via Continuous Submodular Functions

## Abstract

The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. [2012] from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision .

## 1 Introduction

The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past few years, in particular in machine learning and data mining [Domingos & Richardson, 2001; Kempe et al., 2003; Chen et al., 2009; Gomez Rodriguez & Schölkopf, 2012; Borgs et al., 2014].

In the Budget Allocation Problem, one is given a bipartite influence graph between channels and people , and the task is to assign a budget to each channel in with the goal of maximizing the expected number of influenced people . Each edge between channel and person is weighted with a probability that, e.g., an advertisement on radio station will influence person to buy some product. The budget controls how many independent attempts are made via the channel to influence the people in . The probability that a customer is influenced when the advertising budget is is

 It(y)=1−∏(s,t)∈E[1−pst]y(s), (1)

and hence the expected number of influenced people is . We write to make the dependence on the probabilities explicit. The total budget must remain within some feasible set which may encode e.g. a total budget limit . We allow the budgets to be continuous, as in [Bian et al., 2017].

Since its introduction by Alon et al. [2012], several works have extended the formulation of Budget Allocation and provided algorithms Bian et al. [2017]; Hatano et al. [2015]; Maehara et al. [2015]; Soma et al. [2014]; Soma & Yoshida [2015]. Budget Allocation may also be viewed as influence maximization on a bipartite graph, where information spreads as in the Independent Cascade model. For integer , Budget Allocation and Influence Maximization are NP-hard. Yet, constant-factor approximations are possible, and build on the fact that the influence function is submodular in the binary case, and DR-submodular in the integer case [Soma et al., 2014; Hatano et al., 2015]. If is continuous, the problem is a concave maximization problem.

The formulation of Budget Allocation assumes that the transmission probabilities are known exactly. But this is rarely true in practice. Typically, the probabilities , and possibly the graph itself, must be inferred from observations [Gomez Rodriguez et al., 2010; Du et al., 2013; Narasimhan et al., 2015; Du et al., 2014; Netrapalli & Sanghavi, 2012]. In Section 4 we will see that a misspecification or point estimate of parameters can lead to much reduced outcomes. A more realistic assumption is to know confidence intervals for the . Realizing this severe deficiency, recent work studied robust versions of Influence Maximization, where a budget must be chosen that maximizes the worst-case approximation ratio over a set of possible influence functions [He & Kempe, 2016; Chen et al., 2016; Lowalekar et al., 2016]. The resulting optimization problem is hard but admits bicriteria approximations.

In this work, we revisit Budget Allocation under uncertainty from the perspective of robust optimization [Bertsimas et al., 2011; Ben-Tal et al., 2009]. We maximize the worst-case influence – not approximation ratio – for in a confidence set centered around the “best guess” (e.g., posterior mean). This avoids pitfalls of the approximation ratio formulation (which can be misled to return poor worst-case budgets, as demonstrated in Appendix A), while also allowing us to formulate the problem as a max-min game:

 maxy∈Yminp∈PI(y;p), (2)

where an “adversary” can arbitrarily manipulate within the confidence set . With fixed, is concave in . However, the influence function is not convex, and not even quasiconvex, in the adversary’s variables .

The new, key insight we exploit in this work is that has the property of continuous submodularity in – in contrast to previously exploited submodular maximization in – and can hence be minimized by generalizing techniques from discrete submodular optimization Bach [2015]. The techniques in [Bach, 2015], however, are restricted to box constraints, and do not directly apply to our confidence sets. In fact, general constrained submodular minimization is hard [Svitkina & Fleischer, 2011; Goel et al., 2009; Iwata & Nagano, 2009]. We make the following contributions:

1. We present an algorithm with optimality bounds for Robust Budget Allocation in the nonconvex adversarial scenario (2).

2. We provide the first results for continuous submodular minimization with box constraints and one more “nice” constraint, and conditions under which the algorithm is guaranteed to return a global optimum.

### 1.1 Background and Related Work

We begin with some background material and, along the way, discuss related work.

#### Submodularity over the integer lattice and continuous domains

Submodularity is perhaps best known as a property of set functions. A function defined on subsets of a ground set is submodular if for all sets , it holds that . A similar definition extends to functions defined over a distributive lattice , e.g. the integer lattice. Such a function is submodular if for all , it holds that

 f(x)+f(y)≥f(x∨y)+f(x∧y). (3)

For the integer lattice and vectors , denotes the coordinate-wise maximum and the coordinate-wise minimum. Submodularity has also been considered on continuous domains , where, if is also twice-differentiable, the property of submodularity means that all off-diagonal entries of the the Hessian are nonpositive, i.e., for all [Topkis, 1978, Theorem 3.2]. These functions may be convex, concave, or neither.

Submodular functions on lattices can be minimized by a reduction to set functions, more precisely, ring families Birkhoff [1937]. Combinatorial algorithms for submodular optimization on lattices are discussed in [Khachaturov et al., 2012]. More recently, Bach [2015] extended results based on the convex Lovász extension, by building on connections to optimal transport. The subclass of -convex functions admits strongly polynomial time minimization [Murota, 2003; Kolmogorov & Shioura, 2009; Murota & Shioura, 2014], but does not apply in our setting.

Similarly, results for submodular maximization extend to integer lattices, e.g. [Gottschalk & Peis, 2015]. Stronger results are possible if the submodular function also satisfies diminishing returns: for all (coordinate-wise) and such that , it holds that . For such DR-submodular functions, many approximation results for the set function case extend [Bian et al., 2017; Soma & Yoshida, 2015; Soma et al., 2014]. In particular, Ene & Nguyen [2016] show a generic reduction to set function optimization that they apply to maximization. In fact, it also applies to minimization:

###### Proposition 1.1.

A DR-submodular function defined on can be minimized in strongly polynomial time , where and is the time complexity of evaluating . Here, .

###### Proof.

The function can be reduced to a submodular set function via [Ene & Nguyen, 2016], where . The function can be evaluated via mapping from to the domain of , and then evaluating , in time . We can directly substitute these complexities into the runtime bound from [Lee et al., 2015]. ∎

In particular, the time complexity is logarithmic in . For general lattice submodular functions, this is not possible without further assumptions.

#### Related Problems

A sister problem of Budget Allocation is Influence Maximization on general graphs, where a set of seed nodes is selected to start a propagation process. The influence function is still monotone submodular and amenable to the greedy algorithm Kempe et al. [2003], but it cannot be evaluated explicitly and requires approximation Chen et al. [2010]. Stochastic Coverage [Goemans & Vondrák, 2006] is a version of Set Cover where the covering sets are random. A variant of Budget Allocation can be written as stochastic coverage with multiplicity. Stochastic Coverage has mainly been studied in the online or adaptive setting, where logarithmic approximation factors can be achieved [Golovin & Krause, 2011; Deshpande et al., 2016; Adamczyk et al., 2016].

Our objective function (2) is a signomial in , i.e., a linear combination of monomials of the form . General signomial optimization is NP-hard [Chiang, 2005], but certain subclasses are tractable: posynomials with all nonnegative coefficients can be minimized via Geometric Programming [Boyd et al., 2007], and signomials with a single negative coefficient admit sum of squares-like relaxations [Chandrasekaran & Shah, 2016]. Our problem, a constrained posynomial maximization, is not in general a geometric program. Some work addresses this setting via monomial approximation [Pascual & Ben-Israel, 1970; Ecker, 1980], but, to our knowledge, our algorithm is the first that solves this problem to arbitrary accuracy.

#### Robust Optimization

Two prominent strategies of addressing uncertainty in parameters of optimization problems are stochastic and robust optimization. If the distribution of the parameters is known (stochastic optimization), formulations such as value-at-risk (VaR) and conditional value-at-risk (CVaR) Rockafellar & Uryasev [2000, 2002] apply. In contrast, robust optimization [Ben-Tal et al., 2009; Bertsimas et al., 2011] assumes that the parameters (of the cost function and constraints) can vary arbitrarily within a known confidence set , and the aim is to optimize the worst-case setting, i.e.,

 minysupu,A,b∈U{g(y;u)s.t. Ay≤b}. (4)

Here, we will only have uncertainty in the cost function.

In this paper we are principally concerned with robust maximization of the continuous influence function , but mention some results for the discrete case. While there exist results for robust and CVaR optimization of modular (linear) functions [Nikolova, 2010; Bertsimas & Sim, 2003], submodular objectives do not in general admit such optimization Maehara [2015], but variants admit approximations [Zhang et al., 2014]. The brittleness of submodular optimization under noise has been studied in [Balkanski et al., 2016, 2017; Hassidim & Singer, 2016].

Approximations for robust submodular and influence optimization have been studied in [Krause et al., 2008; He & Kempe, 2016; Chen et al., 2016; Lowalekar et al., 2016], where an adversary can pick among a finite set of objective functions or remove selected elements Orlin et al. [2016].

## 2 Robust and Stochastic Budget Allocation

The unknown parameters in Budget Allocation are the transmission probabilities or edge weights in a graph. If these are estimated from data, we may have posterior distributions or, a weaker assumption, confidence sets for the parameters. For ease of notation, we will work with the failure probabilities instead of the directly, and write instead of .

### 2.1 Stochastic Optimization

If a (posterior) distribution of the parameters is known, a simple strategy is to use expectations. We place a uniform prior on , and observe independent observations drawn from . If we observe failures and and successes, the resulting posterior distribution on the variable is . Given such a posterior, we may optimize

 maxy∈Y I(y;E[X]), or (5) maxy∈Y E[I(y;X)]. (6)
###### Proposition 2.1.

Problems (5) and (6) are concave maximization problems over the (convex) set and can be solved exactly.

Concavity of (6) follows since it is an expectation over concave functions, and the problem can be solved by stochastic gradient ascent or by explicitly computing gradients.

Merely maximizing expectation does not explicitly account for volatility and hence risk. One option is to include variance Ben-Tal & Nemirovski [2000]; Bertsimas et al. [2011]; Atamtürk & Narayanan [2008]:

 miny∈Y−E[I(y;X)]+ε√Var(I(y;X)), (7)

but in our case this CVaR formulation seems difficult:

###### Fact 2.1.

For in the nonnegative orthant, the term need not be convex or concave, and need not be submodular or supermodular.

This observation does not rule out a solution, but the apparent difficulties further motivate a robust formulation that, as we will see, is amenable to optimization.

### 2.2 Robust Optimization

The focus of this work is the robust version of Budget Allocation, where we allow an adversary to arbitrarily set the parameters within an uncertainty set . This uncertainty set may result, for instance, from a known distribution, or simply assumed bounds. Formally, we solve

 maxy∈Yminx∈XI(y;x), (8)

where is a convex set with an efficient projection oracle, and is an uncertainty set containing an estimate . In the sequel, we use uncertainty sets , where is a distance (or divergence) from the estimate , and is the box . The intervals can be thought of as either confidence intervals around , or, if , enforce that each is a valid probability.

Common examples of uncertainty sets used in Robust Optimization are Ellipsoidal and D-norm uncertainty sets Bertsimas et al. [2011]. Our algorithm in Section 3.1 applies to both.

Ellipsoidal uncertainty. The ellipsoidal or quadratic uncertainty set is defined by

 XQ(γ)={x∈Box(0,1):(x−^x)TΣ−1(x−^x)≤γ},

where is the covariance of the random vector of probabilities distributed according to our Beta posteriors. In our case, since the distributions on each are independent, is actually diagonal. Writing , we have

 XQ(γ)={x∈Box(0,1):∑(s,t)∈ERst(xst)≤γ},

where .

D-norm uncertainty. The D-norm uncertainty set is similar to an -ball around , and is defined as

 XD(γ)= {x:∃c∈Box(0,1)s.t.∑(s,t)∈E xst=^xst+(ust−^xst)cst,∑(s,t)∈Ecst≤γ}.

Essentially, we allow an adversary to increase up to some upper bound , subject to some total budget across all terms . The set can be rewritten as

 XD(γ)={x∈Box(^x,u):∑(s,t)∈ERst(xst)≤γ},

where is the fraction of the interval we have used up in increasing .

The min-max formulation has several merits: the model is not tied to a specific learning algorithm for the probabilities as long as we can choose a suitable confidence set. Moreover, this formulation allows to fully hedge against a worst-case scenario.

## 3 Optimization Algorithm

As noted above, the function is concave as a function of for fixed . As a pointwise minimum of concave functions, is concave. Hence, if we can compute subgradients of , we can solve our max-min-problem via the subgradient method, as outlined in Algorithm 1.

A subgradient at is given by the gradient of for the minimizing , i.e., . Hence, we must be able to compute for any . We also obtain a duality gap: for any we have

 minx∈XI(y′;x)≤maxy∈Yminx∈XI(y;x)≤maxy∈YI(y;x′). (9)

This means we can estimate the optimal value and use it in Polyak’s stepsize rule for the subgradient method Polyak [1987].

But is not convex in , and not even quasiconvex. For example, standard methods [Wainwright & Chiang, 2004, Chapter 12] imply that is not quasiconvex on . Moreover, the above-mentioned signomial optimization techniques do not apply for an exact solution either. So, it is not immediately clear that we can solve the inner optimization problem.

The key insight we will be using is that has a different beneficial property: while not convex, as a function of is continuous submodular.

###### Lemma 3.1.

Suppose we have differentiable functions , for , either all nonincreasing or all nondecreasing. Then, is a continuous supermodular function from to .

###### Proof.

For , the resulting function is modular and therefore supermodular. In the case , we simply need to compute derivatives. The mixed derivatives are

 ∂f∂xi∂xj=f′i(xi)f′j(xj)⋅∏k≠i,jfk(xk). (10)

By monotonicity, and have the same sign, so their product is nonnegative, and since each is nonnegative, the entire expression is nonnegative. Hence, is continuous supermodular by Theorem 3.2 of [Topkis, 1978]. ∎

###### Corollary 3.1.

The influence function defined in Section 2 is continuous submodular in over the nonnegative orthant, for each .

###### Proof.

Since submodularity is preserved under summation, it suffices to show that each function is continuous submodular. By Lemma 3.1, since is nonnegative and monotone nondecreasing for , the product is continuous supermodular in . Flipping the sign and adding a constant term yields , which is hence continuous submodular. ∎

###### Conjecture 3.1.

Strong duality holds, i.e.

 maxy∈Yminx∈XI(y;x)=minx∈Xmaxy∈YI(y;x). (11)

If strong duality holds, then the duality gap in Equation (9) is zero at optimality. If were quasiconvex in , strong duality would hold by Sion’s min-max theorem, but this is not the case. In practice, we observe that the duality gap always converges to zero.

Bach [2015] demonstrates how to minimize a continuous submodular function subject to box constraints , up to an arbitrary suboptimality gap . The constraint set in our Robust Budget Allocation problem, however, has box constraints with an additional constraint . This case is not addressed in any previous work. Fortunately, for a large class of functions , there is still an efficient algorithm for continuous submodular minimization, which we present in the next section.

### 3.1 Constrained Continuous Submodular Function Minimization

We next address an algorithm for minimizing a monotone continuous submodular function subject to box constraints and a constraint :

 minimizeH(x)s.t.R(x)≤Bx∈Box(l,u). (12)

If and were convex, the constrained problem would be equivalent to solving, with the right Lagrange multipler :

 minimizeH(x)+λ∗R(x)s.t.x∈Box(l,u). (13)

Although and are not necessarily convex here, it turns out that a similar approach indeed applies. The main idea of our approach bears similarity with [Nagano et al., 2011] for the set function case, but our setting with continuous functions and various uncertainty sets is more general, and requires more argumentation. We outline our theoretical results here, and defer further implementation details and proofs to the appendix.

Following Bach [2015], we discretize the problem; for a sufficiently fine discretization, we will achieve arbitrary accuracy. Let be an interpolation mapping that maps the discrete set into via the componentwise interpolation functions . We say is -fine if for all , and we say the full interpolation function is -fine if each is -fine.

This mapping yields functions and via and . is lattice submodular (on the integer lattice). This construction leads to a reduction of Problem (12) to a submodular minimization problem over the integer lattice:

 minimizeHδ(x)+λRδ(x)s.t.x∈∏ni=1[ki]. (14)

Ideally, there should then exist a such that the associated minimizer yields a close to optimal solution for the constrained problem. Theorem 3.1 below states that this is indeed the case.

Moreover, a second benefit of submodularity is that we can find the entire solution path for Problem (14) by solving a single optimization problem.

###### Lemma 3.2.

Suppose is continuous submodular, and suppose the regularizer is strictly increasing and separable: . Then we can recover a minimizer for the induced discrete Problem (14) for any by solving a single convex optimization problem.

The problem in question arises from a relaxation that extends in each coordinate to a function on distributions over the domain . These distributions are represented via their inverse cumulative distribution functions , which take the coordinate as input, and output the probability of exceeding . The function is an analogue of the Lovász extension of set functions to continuous submodular functions [Bach, 2015], it is convex and coincides with on lattice points.

Formally, this resulting single optimization problem is:

 minimizeh↓(ρ)+∑ni=1∑ki−1ji=1aixi(ρi(xi))s.t.ρ∈∏ni=1Rki−1↓ (15)

where refers to the set of ordered vectors that satisfy , the notation denotes the -th coordinate of the vector , and the are strictly convex functions given by

 aixi(t)=12t2⋅[Rδi(xi)−Rδi(xi−1)]. (16)

Problem (15) can be solved by Frank-Wolfe methods [Frank & Wolfe, 1956; Dunn & Harshbarger, 1978; Lacoste-Julien, 2016; Jaggi, 2013]. This is because the greedy algorithm for computing subgradients of the Lovász extension can be generalized, and yields a linear optimization oracle for the dual of Problem (15). We detail the relationship between Problems (14) and (15), as well as how to implement the Frank-Wolfe methods, in Appendix C.

Let be the optimal solution for Problem (15). For any , we obtain a rounded solution for Problem (14) by thresholding: we set , or zero if for all . Each is the optimal solution for Problem (14) with . We use the largest parameterized solution that is still feasible, i.e. the solution where solves

 minHδ(x(λ))s.t.λ≥0Rδ(x(λ))≤B. (17)

This can be found efficiently via binary search or a linear scan.

###### Theorem 3.1.

Let be continuous submodular and monotone decreasing, with -Lipschitz constant , and let be strictly increasing and separable. Assume all entries of the optimal solution of Problem (15) are distinct. Let be the thresholding corresponding to the optimal solution of Problem (17), mapped back into the original continuous domain . Then is feasible for the continuous Problem (12), and is a -approximate solution:

 H(x′)≤2Gδ+minx∈Box(l,u),R(x)≤BH(x).

Theorem 3.1 implies an algorithm for solving Problem (12) to -optimality: (1) set , (2) compute which solves Problem (15), (3) find the optimal thresholding of by determining the smallest for which , and (4) map back into continuous space via the interpolation mapping .

Optimality Bounds. Theorem 3.1 is proved by comparing and to the optimal solution on the discretized mesh

 x∗d∈\operatornamewithlimitsargminx∈∏ni=1[ki]:Rδ(x)≤BHδ(x).

Beyond the theoretical guarantee of Theorem 3.1, for any problem instance and candidate solution , we can compute a bound on the gap between and . The following two bounds are proved in the appendix:

1. We can generate a discrete point satisfying

 H(x′)≤[H(x′)−Hδ(x(λ+))]+Hδ(x∗d).
2. The Lagrangian yields the bound

 H(x′)≤λ∗(B−R(x′))+Hδ(x∗d).

Improvements. The requirement in Theorem 3.1 that the elements of be distinct may seem somewhat restrictive, but as long as has distinct elements in the neighborhood of our particular , this bound still holds. We see in Section 4.1.1 that in practice, almost always has distinct elements in the regime we care about, and the bounds of Remark 3.1 are very good.

If is DR-submodular and is affine in each coordinate, then Problem (14) can be represented more compactly via the reduction of Ene & Nguyen [2016], and hence problem (12) can be solved more efficiently. In particular, the influence function is DR-submodular in when for each , or .

### 3.2 Application to Robust Budget Allocation

The above algorithm directly applies to Robust Allocation with the uncertainty sets in Section 2.2. The ellipsoidal uncertainty set corresponds to the constraint that with , and . By the monotonicity of , there is never incentive to reduce any below , so we can replace with . On this interval, each is strictly increasing, and Theorem 3.1 applies.

For D-norm sets, we have . Since each is monotone, Theorem 3.1 applies.

Runtime and Alternatives. Since the core algorithm is Frank-Wolfe, it is straightforward to show that Problem (15) can be solved to -suboptimality in time , where is the minimum derivative of the functions . If has distinct elements separated by , then choosing results in an exact solution to (14) in time .

Noting that is submodular for all , one could instead perform binary search over , each time converting the objective into a submodular set function via Birkhoff’s theorem and solving submodular minimization e.g. via one of the recent fast methods [Chakrabarty et al., 2017; Lee et al., 2015]. However, we are not aware of a practical implementation of the algorithm in [Lee et al., 2015]. The algorithm in [Chakrabarty et al., 2017] yields a solution in expectation. This approach also requires care in the precision of the search over , whereas our approach searches directly over the elements of .

## 4 Experiments

We evaluate our Robust Budget Allocation algorithm on both synthetic test data and a real-world bidding dataset from Yahoo! Webscope yah [] to demonstrate that our method yields real improvements. For all experiments, we used Algorithm 1 as the outer loop. For the inner submodular minimization step, we implemented the pairwise Frank-Wolfe algorithm of Lacoste-Julien & Jaggi [2015]. In all cases, the feasible set of budgets is where the specific budget depends on the experiment. Our code is available at git.io/vHXkO.

### 4.1 Synthetic

On the synthetic data, we probe two questions: (1) how often does the distinctness condition of Theorem 3.1 hold, so that we are guaranteed an optimal solution; and (2) what is the gain of using a robust versus non-robust solution in an adversarial setting? For both settings, we set and and discretize with . We generated true probabilties , created Beta posteriors, and built both Ellipsoidal uncertainty sets and D-norm sets .

#### Optimality

Theorem 3.1 and Remark 3.1 demand that the values be distinct at our chosen Lagrange multiplier and, under this condition, guarantee optimality. We illustrate this in four examples: for Ellipsoidal or a D-norm uncertainty set, and a total influence budget . Figure 3 shows all elements of in sorted order, as well as a horizontal line indicating our Lagrange multiplier which serves as a threshold. Despite some plateaus, the entries are distinct in most regimes, in particular around , the regime that is needed for our results. Moreover, in practice (on the Yahoo data) we observe later in Figure 3 that both solution-dependent bounds from Remark 3.1 are very good, and all solutions are optimal within a very small gap.

#### Robustness and Quality

Next, we probe the effect of a robust versus non-robust solution for different uncertainty sets and budgets of the adversary. We compare our robust solution with using a point estimate for , i.e., , treating estimates as ground truth, and the stochastic solution as per Section 2.1. These two optimization problems were solved via standard first-order methods using TFOCS Becker et al. [2011].

Figure 2 demonstrates that indeed, the alternative budgets are sensitive to the adversary and the robustly-chosen budget performs better, even in cases where the other budgets achieve zero influence. When the total budget is large, performs nearly as well as , but when resources are scarce ( is small) and the actual choice seems to matter more, performs far better.

### 4.2 Yahoo! data

To evaluate our method on real-world data, we formulate a Budget Allocation instance on advertiser bidding data from Yahoo! Webscope [yah, ]. This dataset logs bids on 1000 different phrases by advertising accounts. We map the phrases to channels and the accounts to customers , with an edge between and if a corresponding bid was made. For each pair , we draw the associated transmission probability uniformly from . We bias these towards zero because we expect people not to be easily influenced by advertising in the real world. We then generate an estimate and build up a posterior by generating samples from , where is the number of bids between and in the dataset.

This transformation yields a bipartite graph with , , and more than 50,000 edges that we use for Budget Allocation. In our experiments, the typical gap between the naive and robust was 100-500 expected influenced people. We plot convergence of the outer loop in Figure 3, where we observe fast convergence of both primal influence value and the dual bound.

### 4.3 Comparison to first-order methods

Given the success of first-order methods on nonconvex problems in practice, it is natural to compare these to our method for finding the worst-case vector . On one of our Yahoo problem instances with D-norm uncertainty set, we compared our submodular minimization scheme to Frank-Wolfe with fixed stepsize as in [Lacoste-Julien, 2016], implementing the linear oracle using MOSEK [MOSEK ApS, 2015]. Interestingly, from various initializations, Frank-Wolfe finds an optimal solution, as verified by comparing to the guaranteed solution of our algorithm. Note that, due to non-convexity, there are no formal guarantees for Frank-Wolfe to be optimal here, motivating the question of global convergence properties of Frank-Wolfe in the presence of submodularity.

It is important to note that there are many cases where first-order methods are inefficient or do not apply to our setup. These methods require either a projection oracle (PO) onto or linear optimization oracle (LO) over the feasible set defined by , and . The D-norm set admits a LO via linear programming, but we are not aware of any efficient LO for Ellipsoidal uncertainty, nor PO for either set, that does not require quadratic programming. Even more, our algorithm applies for nonconvex functions which induce nonconvex feasible sets . Such nonconvex sets may not even admit a unique projection, while our algorithm achieves provable solutions.

## 5 Conclusion

We address the issue of uncertain parameters (or, model mis-specification) in Budget Allocation or Bipartite Influence Maximization [Alon et al., 2012] from a robust optimization perspective. The resulting Robust Budget Allocation is a nonconvex-concave saddle point problem. Although the inner optimization problem is nonconvex, we show how continuous submodularity can be leveraged to solve the problem to arbitrary accuracy , as can be verified with the proposed bounds on the duality gap. In particular, our approach extends continuous submodular minimization methods [Bach, 2015] to more general constraint sets, introducing a mechanism to solve a new class of constrained nonconvex optimization problems. We confirm on synthetic and real data that our method finds high-quality solutions that are robust to parameters varying arbitrarily in an uncertainty set, and scales up to graphs with over 50,000 edges.

There are many compelling directions for further study. The uncertainty sets we use are standard in the robust optimization literature, but have not been applied to e.g. Robust Influence Maximization; it would be interesting to generalize our ideas to general graphs. Finally, despite the inherent nonconvexity of our problem, first-order methods are often able to find a globally optimal solution. Explaining this phenomenon requires further study of the geometry of constrained monotone submodular minimization.

### Acknowledgements

We thank the anonymous reviewers for their helpful suggestions. We also thank MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing computational resources. This research was conducted with Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and also supported by NSF CAREER award 1553284.

## Appendix A Worst-Case Approximation Ratio versus True Worst-Case

Consider the function defined on , with values given by:

 f(x;0)={1x=00.6x=1,f(x;1)={1x=02x=1. (18)

We wish to choose to maximize robustly with respect to adversarial choices of . If were fixed, we could directly choose to maximize . In particular, and . Of course, we want to deal with worst-case . One option is to maximize the worst-case approximation ratio:

 maxxminθf(x;θ)f(x∗θ;θ). (19)

One can verify that the best according to this criterion is , with worst-case approximation ratio 0.6 and worst-case function value 0.6. In this paper, we optimize the worst-case of the actual function value:

 maxxminθf(x;θ). (20)

This criterion will select , which has a worse worst-case approximation ratio of 0.5, but actually guarantees a function value of 1, significantly better than the 0.6 achieved by the other formulation of robustness.

## Appendix B DR-submodularity and L♮-convexity

A function is -convex if it satisfies a discrete version of midpoint convexity, i.e. for all it holds that

 Missing or unrecognized delimiter for \left (21)
###### Remark B.1.

An -convex function need not be DR-submodular, and vice-versa. Hence algorithms for optimizing one type may not apply for the other.

###### Proof.

Consider and , both defined on . The function is DR-submodular but violates discrete midpoint convexity for the pair of points and , while is -convex but does not have diminishing returns in either dimension. ∎

Intuitively-speaking, -convex functions look like discretizations of convex functions. The continuous objective function we consider need not be convex, hence its discretization need not be -convex, and we cannot use those tools. However, in some regimes (namely if each ), it happens that is DR-submodular in .

## Appendix C Constrained Continuous Submodular Function Minimization

Define to be the set of vectors in which are monotone nonincreasing, i.e. . As in the main text, define . One of the key results from Bach [2015] is that an arbitrary submodular function defined on can be extended to a particular convex function so that

 minimizeH(x)s.t.x∈∏ni=1[ki]⇔minimizeh↓(ρ)s.t.ρ∈∏ni=1Rki−1↓. (22)

Moreover, Theorem 4 from Bach [2015] states that, if are strictly convex functions for all and each , then the two problems

 minimizeH(x)+∑ni=1∑xiyi=1a′iyi(λ)s.t.x∈∏ni=1[ki]. (23)

and

 minimizeh↓(ρ)+∑ni=1∑ki−1xi=1aixi[ρi(xi)]s.t.ρ∈∏ni=1Rki−1↓ (24)

are equivalent. In particular, one recovers a solution to Problem (23) for any just as alluded to in Lemma 3.2: find which solves Problem (24) and, for each component , choose to be the maximal value for which .

### c.1 Proof of Lemma 3.2

###### Proof.

The discretized form of the regularizer is also separable and can be written . For each and each with , define , so that . Since we assumed is strictly increasing, the coefficient of in each is strictly positive, so that each is strictly convex. Then,

 λRδi(xi) =λ⋅⎡⎣Rδi(0)+xi∑yi=1(Rδi(yi)−Rδi(yi−1))⎤⎦ (25) =λRδi(0)+xi∑yi=1a′iyi(λ), (26)

so that the discretized version of the minimization problem can be written as

 minimizeHδ(x)+λRδ(0)+∑ni=1∑xiyi=1a′iyi(λ)s.t.x∈∏ni=1[ki]. (27)

Since the term does not depend on the variable , this minimization is equivalent to

 minimizeHδ(x)+∑ni=1∑xiyi=1a′iyi(λ)s.t.x∈∏ni=1[ki]. (28)

This problem is in the precise form where we can apply the preceding equivalence result between Problems (23) and (24), so we are done. ∎

### c.2 Proof of Theorem 3.1

###### Proof.

The general idea of this proof is to first show that the integer-valued point which solves

 x∗d∈\operatornamewithlimitsargminx∈∏ni=1[ki]:Rδ(x)≤BHδ(x)

is also nearly a minimizer of the continuous version of the problem, due to the fineness of the discretization. Then, we show that the solutions traced out by get very close to . These two results are simply combined via the triangle inequality.

#### Continuous and Discrete Problems

We begin by proving that

 Hδ(x∗d)≤Gδ+minx∈X:R(x)≤BH(x). (29)

Consider . If corresponds to an integral point in the discretized domain, then and we are done. Else, has at least one non-integral coordinate. By rounding coordinatewise, we can construct a set so that . By monotonicity, there must be some with , i.e. is feasible for the original continuous problem. By construction, since the discretization given by is -fine, we must have . Applying the Lipschitz property of and the optimality of , we have

 Gδ≥H(A(xi))−H(x∗)=Hδ(xi)−H(x∗)≥Hδ(x∗d)−H(x∗),

from which (29) follows.

#### Discrete and Parameterized Discrete Problems

Define and by

 λ− ∈\operatornamewithlimitsargminλ≥0:Rδ(x(λ))≤BHδ(x(λ))and λ+ ∈\operatornamewithlimitsargmaxλ≥0:Rδ(x(λ))≥BHδ(x(λ)).

The next step in proving our suboptimality bound is to prove that

 Hδ(x(λ+))≤Hδ(x∗d)≤Hδ(x(λ−)), (30)

from which it will follow that

 Hδ(x(λ−))≤Gδ+Hδ(x∗d).

We begin by stating the min-max inequality, i.e. weak duality:

 minx∈∏ni=1[ki]:Rδ(x)≤BHδ(x) =minx∈∏ni=1[ki]maxλ≥0{Hδ(x)+λ(Rδ(x)−B)} (31) ≥maxλ≥0minx∈∏ni=1[ki]{Hδ(x)+λ(Rδ(x)−B)} (32) =maxλ≥0{Hδ(x(λ))+λ(Rδ(x(λ))−B)} (33) ≥maxλ≥0:Rδ(x(λ))≥B{Hδ(x(λ))+λ(Rδ(x(λ))−B)} (34) ≥maxλ≥0:Rδ(x(λ))≥BH