Parallel Algorithm for NonMonotone DRSubmodular Maximization
Abstract
In this work, we give a new parallel algorithm for the problem of maximizing a nonmonotone diminishing returns submodular function subject to a cardinality constraint. For any desired accuracy , our algorithm achieves a approximation using parallel rounds of function evaluations. The approximation guarantee nearly matches the best approximation guarantee known for the problem in the sequential setting and the number of parallel rounds is nearlyoptimal for any constant . Previous algorithms achieve worse approximation guarantees using parallel rounds. Our experimental evaluation suggests that our algorithm obtains solutions whose objective value nearly matches the value obtained by the state of the art sequential algorithms, and it outperforms previous parallel algorithms in number of parallel rounds, iterations, and solution quality.
1 Introduction
In this paper, we study parallel algorithms for the problem of maximizing a nonmonotone DRsubmodular function subject to a single cardinality constraint^{1}^{1}1A DRsubmodular function is a continuous function with the diminishing returns property: if coordinatewise then coordinatewise.. The problem is a generalization of submodular maximization subject to a cardinality constraint. Many recent works have shown that DRsubmodular maximization has a widerange of applications beyond submodular maximization. These applications include maximum aposteriori (MAP) inference for determinantal point processes (DPP), meanfield inference in logsubmodular models, quadratic programming, and revenue maximization in social networks [16, 13, 6, 14, 17, 5, 4].
The problem of maximizing a DRsubmodular function subject to a convex constraint is a notable example of a nonconvex optimization problem that can be solved with provable approximation guarantees. The continuous Greedy algorithm [18] developed in the context of the multilinear relaxation framework applies more generally to maximizing DRsubmodular functions that are monotone increasing (if coordinatewise then ). Chekuri et al. [7] developed algorithms for both monotone and nonmonotone DRsubmodular maximization subject to packing constraints that are based on the continuous Greedy and multiplicative weights update framework. The work [5] generalized continuous Greedy for submodular functions to the DRsubmodular case and developed FrankWolfestyle algorithms for maximizing nonmonotone DRsubmodular function subject to general convex constraints.
A significant drawback of these algorithms is that they are inherently sequential and adaptive. In fact the highly adaptive nature of these algorithms go back to the classical greedy algorithm for submodular functions: the algorithm sequentially selects the next element based on the marginal gain on top of previous elements. In certain settings such as feature selection [15] evaluating the objective function is a timeconsuming procedure and the main bottleneck of the optimization algorithm and therefore, parallelization is a must. Recent lines of work have focused on addressing these shortcomings and understanding the tradeoffs between approximation guarantee, parallelization, and adaptivity. Starting with the work of Balkanski and Singer [3], there have been very recent efforts to understand the tradeoff between approximation guarantee and adaptivity for submodular maximization [3, 9, 2, 12, 8, 1]. The adaptivity of an algorithm is the number of sequential rounds of queries it makes to the evaluation oracle of the function, where in every round the algorithm is allowed to make polynomiallymany parallel queries. Recently, the work [11] gave an algorithm for maximizing a submodular function subject to a cardinality constraint in rounds and approximation. For the general setting of DRsubmodular functions with packing constraints, the work [10] gave an algorithm with rounds and approximation. In the special case of constraint, this algorithm uses rounds.
In this work, we develop a new algorithm for DRsubmodular maximization subject to a single cardinality constraint using rounds of adaptivity and obtaining approximation. For constant , the number of rounds is almost a quadratic improvement from in the previous work to the nearly optimal rounds.
Theorem 1.
Let be a DRsubmodular function and . For every , there is an algorithm for the problem with the following guarantees:

The algorithm is deterministic if provided oracle access for evaluating and its gradient ;

The algorithm achieves an approximation guarantee of ;

The number of rounds of adaptivity is .
2 Preliminaries
In this paper, we consider nonnegative functions that are diminishing returns submodular (DRsubmodular). A function is DRsubmodular if (where is coordinatewise), , such that and are still in , it holds
where is the th basis vector, i.e., the vector whose th entry is and all other entries are .
If is differentiable, is DRsubmodular if and only if for all . If is twicedifferentiable, is DRsubmodular if and only if all the entries of the Hessian are nonpositive, i.e., for all .
For simplicity, throughout the paper, we assume that is differentiable. We assume that we are given blackbox access to an oracle for evaluating and its gradient . It is convenient to extend the function to as follows: , where .
An example of a DRsubmodular function is the multilinear extension of a submodular function. The multilinear extension of a submodular function is defined as follows:
where is a random subset of where each is included independently at random with probability .
Basic notation. We use e.g. to denote a vector in . We use the following vector operations: is the vector whose th coordinate is ; is the vector whose th coordinate is ; is the vector whose th coordinate is . We write to denote that for all . Let (resp. ) be the dimensional allzeros (resp. allones) vector. Let denote the indicator vector of , i.e., the vector that has a in entry if and only if .
We will use the following result that was shown in previous work [7].
Lemma 2 ([7], Lemma 7).
Let be a DRsubmodular function. For all and , .
3 The algorithm
In this section, we present an idealized version of our algorithm where we assume that we can compute exactly the step size on line 16. The idealized algorithm is given in Algorithm 1. In the appendix (Section B), we show how to implement that step efficiently and incur only additive error in the approximation.
The algorithm takes as input a target value and it achieves the desired approximation if is an approximation of the optimal function value , i.e., we have . As noted in previous work [10], it is straightforward to approximately guess such a value using a single parallel round.
Finding the step size on line 16. As mentioned earlier, we assume that we can find the step exactly. In the appendix, we show that we can efficiently find approximately using ary search for suitable . We can choose to obtain different tradeoffs between the number of parallel rounds and total running time, see Section B in the appendix for more details.
Finding the step size on line 18. We have and
Additionally, for each , we have and thus . Therefore is the minimum between and the following value:
4 Analysis of the approximation guarantee
In this section, we show that Algorithm 1 achieves a approximation. Recall that we assume that is computed exactly on line 16. In Section B of the appendix, we show how to extend the algorithm and the analysis so that the algorithm efficiently computes a suitable approximation to that suffices for obtaining a approximation.
In the following, we refer to each iteration of the outer for loop as a phase. We refer to each iteration of the inner while loop as an iteration. Note that the update vectors are nonnegative in each iteration of the algorithm, and thus the vectors remain nonnegative throughout the algorithm and they can only increase. Additionally, since , we have throughout the algorithm. We will also use the following observations repeatedly, whose straightforward proofs are deferred to Section A of the appendix. By DRsubmodularity, since the relevant vectors can only increase in each coordinate, the relevant gradients can only decrease in each coordinate. This implies that, for every , we have . Additionally, for every , we have .
We will need an upper bound on the and norms of and . Since , it suffices to upper bound the norms of (the norm bound will be used to show that the final solution is feasible, and the norm bound will be used to derive the approximation guarantee). We do so in the following lemma.
Lemma 3.
Consider phase of the algorithm (the th iteration of the outer for loop). Throughout the phase, the algorithm maintains the invariant that and .
Proof.
We show that the invariants are maintained using induction on the number of iterations of the inner while loop in phase . Let be the vector right before the update on line 21 and let be the vector right after the update. By the induction hypothesis, we have . If , we have , and the invariant is maintained. Therefore we may assume that . By the definition of , we have . We have . Thus the invariant is maintained.
Next, we show the upper bound on the norm. Note that , where is the step size chosen on line 19. Thus we have , where the last inequality is by the choice of . ∎
Theorem 4.
Consider a phase of the algorithm (an iteration of the outer for loop). Let and be the vector at the beginning and end of the phase. We have
Proof.
We consider two cases, depending on whether the threshold at the end of the phase is equal to or not.
Case 1: we have . Note that the phase terminates with in this case. We fix an iteration of the phase that updates and on lines 20–23, and analyze the gain in function value in the current iteration. We let denote the vectors right before the update on lines 20–23. Let be the vector right after the update on line 20, and let be the vector right after the update on line 21.
We have:
In (a), we used the fact that and is concave in nonnegative directions.
We can show (b) as follows. We have and thus by DRsubmodularity. Additionally, for every coordinate , we have . Therefore, for every , we have .
In (c), we have used that , for all , and for all .
We can show (d) as follows. Since , we have , where the first inequality is by Lemma 14 and the second inequality is by the choice of .
Let and denote and in iteration of the phase (note that we are momentarily overloading and here, and they temporarily stand for the step size in iterations and , and not for the step sizes on lines 16 and 18). By summing up the above inequality over all iterations, we obtain:
We can show (a) as follows. Recall that we have . Since , we have .
In (b), we used the definition of on line 7.
In (c), we used that .
Case 2: we have . Note that this implies that , since line 13 was executed at least once during the phase.
Let be the following subset of the coordinates:
Lemma 5.
We have
Proof.
Fix an iteration of the phase that updates and on lines 20–23. Let denote the vectors right before the update on lines 20–23. Let be the vector right after the update on line 20, and let be the vector right after the update on line 21.
We have:
In (a), we used the fact that and is concave in nonnegative directions.
We can show (b) as follows. We have and thus by DRsubmodularity. Additionally, for every coordinate , we have . Therefore, for every , we have .
We can show (c) as follows. Since , we have , where the first inequality is by Lemma 14 and the second inequality is by the choice of . By the definition of , we have for every . By the definition of , we have for every .
Equality (d) follows from the fact that , which we can show as follows. We have , and is the set of all coordinates with negative gradient . Thus it suffices to show that the coordinates in have positive gradient . For every , we have , where the first inequality is by DRsubmodularity (since ) and the second inequality is by the definition of and the fact that for all (Lemma 3). Moreover, we have for all (Lemma 3). Thus for all , and hence .
In (e), we have used that and is nonnegative on the coordinates of .
In (f), we have used that, for all .
In (g), we have used that and thus by DRsubmodularity.
We will also need the following lemmas.
Lemma 6.
For every , we have:
Proof.
Since was empty at the previous threshold , we have or . If it is the latter, the claim follows, since . Therefore we may assume it is the former. By Lemma 3, . Therefore
where in the second inequality we used that for sufficiently small (since ). ∎
Lemma 7.
We have:
Proof.
We have:
In (a), we used that for all .
In (b), we used the fact that is concave in nonnegative directions.
In (c), we used the fact that the algorithm maintains the invariant that via the update on line 23.
In (d), we used Lemma 2.
In (e), we used Lemma 3. ∎
Recall that the phase terminates with either or . We consider each of these cases in turn.
Lemma 8.
Suppose that . We have:
Proof.
By Lemma 5, we have: