Monotone Submodular Maximization over a Matroid
via NonOblivious Local Search
Abstract
We present an optimal, combinatorial approximation algorithm for monotone submodular optimization over a matroid constraint. Compared to the continuous greedy algorithm (Calinescu, Chekuri, Pál and Vondrák, 2008), our algorithm is extremely simple and requires no rounding. It consists of the greedy algorithm followed by local search. Both phases are run not on the actual objective function, but on a related auxiliary potential function, which is also monotone submodular.
In our previous work on maximum coverage (Filmus and Ward, 2012), the potential function gives more weight to elements covered multiple times. We generalize this approach from coverage functions to arbitrary monotone submodular functions. When the objective function is a coverage function, both definitions of the potential function coincide.
Our approach generalizes to the case where the monotone submodular function has restricted curvature. For any curvature , we adapt our algorithm to produce a approximation. This matches results of Vondrák (2008), who has shown that the continuous greedy algorithm produces a approximation when the objective function has curvature , and proved that achieving any better approximation ratio is impossible in the value oracle model.
1 Introduction
In this paper, we consider the problem of maximizing a monotone submodular function , subject to a single matroid constraint. Formally, let be a set of elements and let be a function assigning a value to each subset of . We say that is submodular if
for all . If additionally, whenever , we say that is monotone submodular. Submodular functions exhibit (and are, in fact, alternately characterized by) the property of diminishing returns—if is submodular then for all . Hence, they are useful for modeling economic and gametheoretic scenarios, as well as various combinatorial problems. In a general monotone submodular maximization problem, we are given a value oracle for and a membership oracle for some distinguished collection of feasible sets, and our goal is to find a member of that maximizes the value of . We assume further that is normalized so that .
We consider the restricted setting in which the collection forms a matroid. Matroids are intimately connected to combinatorial optimization: the problem of optimizing a linear function over a hereditary set system (a set system closed under taking subsets) is solved optimally for all possible functions by the standard greedy algorithm if and only if the set system is a matroid [27, 8].
In the case of a monotone submodular objective function, the standard greedy algorithm, which takes at each step the element yielding the largest increase in while maintaining independence, is (only) a approximation [17]. Recently, Calinescu et al. [5, 28, 6] have developed a approximation for this problem via the continuous greedy algorithm, which is essentially a steepest ascent algorithm running in continuous time (when implemented, a suitably discretized version is used), producing a fractional solution. The fractional solution is rounded using pipage rounding [1] or swap rounding [7].
Feige [9] has shown that improving the bound is NPhard. Nemhauser and Wolsey [24] have shown that any improvement over requires an exponential number of queries in the value oracle setting.
Following Vondrák [29], we also consider the case when has restricted curvature. We say that has curvature if for any two disjoint ,
When , this is a restatement of monotonicity of , and when , linearity of . Vondrák [29] has shown that the continuous greedy algorithm produces a approximation when has curvature . Furthermore, he has shown that any improvement over requires an exponential number of queries in the value oracle setting.
1.1 Our contribution
In this paper, we propose a conceptually simple randomized polynomial time local search algorithm for the problem of monotone submodular matroid maximization. Like the continuous greedy algorithm, our algorithm delivers the optimal approximation. However, unlike the continuous greedy algorithm, our algorithm is entirely combinatorial, in the sense that it deals only with integral solutions to the problem and hence involves no rounding procedure. As such, we believe that the algorithm may serve as a gateway to further improved algorithms in contexts where pipage rounding and swap rounding break down, such as submodular maximization subject to multiple matroid constraints.
Our main results are a combinatorial approximation algorithm for monotone submodular matroid maximization, running in randomized time , and a combinatorial approximation algorithm running in randomized time , where is the rank of the given matroid and is the size of its ground set. Our algorithm further generalizes to the case in which the submodular function has curvature ^{1}^{1}1In fact, it is enough to assume that for any two disjoint independent sets .. In this case the approximation ratios obtained are and , respectively, again matching the performance of the continuous greedy algorithm [29]. Unlike the continuous greedy algorithm, our algorithm requires knowledge of . However, by enumerating over values of we are able to obtain a combinatorial algorithm even in the case that ’s curvature is unknown.^{2}^{2}2For technical reasons, we require that has curvature bounded away from zero in this case.
Our algorithmic approach is based on local search. In classical local search, the algorithm starts at an arbitrary solution, and proceeds by iteratively making small changes that improve the objective function, until no such improvement can be made. A natural, worstcase guarantee on the approximation performance of a local search algorithm is the locality ratio, given as , where is a locally optimal solution (i.e. a solution which cannot be improved by the small changes considered by the algorithm), is a global optimum, and is the objective function.
In many cases, classical local search may have a very poor locality ratio, implying that a locallyoptimal solution may be of significantly lower quality than the global optimum. For example, for monotone submodular maximization over a matroid, the locality ratio for an algorithm changing a single element at each step is [17]. Nonoblivious local search, a technique first proposed by Alimonti [2] and by Khanna, Motwani, Sudan and Vazirani [20], attempts to avoid this problem by making use of a secondary potential function to guide the search. By carefully choosing this auxiliary function, we ensure that poor local optima with respect to the original objective function are no longer local optima with respect to the new potential function. This is the approach that we adopt in the design of our local search algorithm. Specifically, we consider a simple local search algorithm in which the value of a solution is measured with respect to a carefully designed potential function , rather than the submodular objective function . We show that solutions which are locally optimal with respect to have significantly higher worstcase quality (as measured by the problem’s original potential function ) than those which are locally optimal with respect to .
In previous work [14], we designed an optimal nonoblivious local search algorithm for the restricted case of maximum coverage subject to a matroid constraint. In this problem, we are given a weighted universe of elements, a collection of sets, and a matroid defined on this collection. The goal is to find a collection of sets that is independent in the matroid and covers elements of maximum total weight. The nonoblivious potential function used in [14] gives extra weight to solutions that cover elements multiple times. That is, the potential function depends critically on the coverage representation of the objective function. In the present work, we extend this approach to general monotone submodular functions. This presents two challenges: defining a nonoblivious potential function without reference to the coverage representation, and analyzing the resulting algorithm.
In order to define the general potential function, we construct a generalized variant of the potential function from [14] that does not require a coverage representation. Instead, the potential function aggregates information obtained by applying the objective function to all subsets of the input, weighted according to their size. Intuitively, the resulting potential function gives extra weight to solutions that contain a large number of good subsolutions, or equivalently, remain good solutions in expectation when elements are removed by a random process. An appropriate setting of the weights defining our potential function yields a function which coincides with the previous definition for coverage functions, but still makes sense for arbitrary monotone submodular functions.
The analysis of the algorithm in [14] is relatively straightforward. For each type of element in the universe of the coverage problem, we must prove a certain inequality among the coefficients defining the potential function. In the general setting, however, we need to construct a proof using only the inequalities given by monotonicity and submodularity. The resulting proof is nonobvious and delicate.
This paper extends and simplifies previous work by the same authors. The paper [15], appearing in FOCS 2012, only discusses the case . The general case is discussed in [16], which appears in ArXiv. The potential functions used to guide the nonoblivious local search in both the unrestricted curvature case [15] and the maximum coverage case [14] are special cases of the function we discuss in the present paper.^{3}^{3}3The functions from [15, 16] are defined in terms of certain coefficients , which depend on a parameter . Our definition corresponds to the choice . We examine the case of coverage functions in more detail in Section 7.3. An exposition of the ideas of both [14] and [16] can be found in the second author’s thesis [31]. In particular, the thesis explains how the auxiliary objective function can be determined by solving a linear program, both in the special case of maximum coverage and in the general case of monotone submodular functions with restricted curvature.
1.2 Related work
Fisher, Nemhauser and Wolsey [25, 17] analyze greedy and local search algorithms for submodular maximization subject to various constraints, including single and multiple matroid constraints. They obtain some of the earliest results in the area, including a approximation algorithm for monotone submodular maximization subject to matroid constraints. A recent survey by Goundan and Schulz [19] reviews many results pertaining to the greedy algorithm for submodular maximization.
More recently, Lee, Sviridenko and Vondrák [23] consider the problem of both monotone and nonmonotone submodular maximization subject to multiple matroid constraints, attaining a approximation for monotone submodular maximization subject to constraints using local search. Feldman et al. [13] show that a local search algorithm attains the same bound for the related class of exchange systems, which includes the intersection of strongly base orderable matroids, as well as the independent set problem in claw free graphs. Further work by Ward [30] shows that a nonoblivious local search routine attains an improved approximation ratio of for this class of problems.
In the case of unconstrained nonmonotone maximization, Feige, Mirrokni and Vondrák [10] give a approximation algorithm via a randomized local search algorithm, and give an upper bound of in the value oracle model. Gharan and Vondrák [18] improved the algorithmic result to by enhancing the local search algorithm with ideas borrowed from simulated annealing. Feldman, Naor and Schwarz [12] later improved this to by using a variant of the continuous greedy algorithm. Buchbinder, Feldman, Naor and Schwartz have recently obtained an optimal approximation algorithm [4].
In the setting of constrained nonmonotone submodular maximization, Lee et al. [22] give a approximation algorithm for the case of matroid constraints and a approximation algorithm for knapsack constraints. Further work by Lee, Sviridenko and Vondrák [23] improves the approximation ratio in the case of matroid constraints to . Feldman et al. [13] attain this ratio for exchange systems. In the case of nonmonotone submodular maximization subject to a single matroid constraint, Feldman, Naor and Schwarz [11] show that a version of the continuous greedy algorithm attains an approximation ratio of . They additionally unify various applications of the continuous greedy algorithm and obtain improved approximations for nonmonotone submodular maximization subject to a matroid constraint or knapsack constraints.
1.3 Organization of the paper
We begin by giving some basic definitions in Section 2. In Section 3 we introduce our basic, nonoblivious local search algorithm, which makes use of an auxiliary potential function . In Section 4, we give the formal definition of , together with several of its properties. Unfortunately, exact computation of the function requires evaluating on an exponential number of sets. In Section 5 we present a simplified analysis of our algorithm, under the assumption that an oracle for computing the function is given. In Section 6 we then show how to remove this assumption to obtain our main, randomized polynomial time algorithm. The resulting algorithm uses a polynomialtime random sampling procedure to compute the function approximately. Finally, some simple extensions of our algorithm are described in Section 7.
2 Definitions
Notation
If is some Boolean condition, then
For a natural number, . We use to denote the th Harmonic number,
It is wellknown that , where is the natural logarithm.
For a set and an element, we use the shorthands and . We use the notation even when , in which case , and the notation even when , in which case .
Let be a set. A setfunction on is a function whose arguments are subsets of . For , we use . For , the marginal of with respect to is
Properties of setfunctions
A setfunction is normalized if . It is monotone if whenever then . It is submodular if whenever and is disjoint from , . If is monotone, we need not assume that and are disjoint. Submodularity is equivalently characterized by the inequality
for all and .
The setfunction has curvature if for all and , . Equivalently, for all disjoint . Note that if has curvature and , then also has curvature . Every normalized monotone function thus has curvature . A normalized function with curvature is linear; that is, .
Matroids
A matroid is composed of a ground set and a nonempty collection of subsets of satisfying the following two properties: (1) If and then ; (2) If and then for some .
The sets in are called independent sets. Maximal independent sets are known as bases. Condition (2) implies that all bases of the matroid have the same size. This common size is called the rank of the matroid.
One simple example is a partition matroid. The universe is partitioned into parts , and a set is independent if it contains at most one element from each part.
If is an independent set, then the contracted matroid is given by
Monotone submodular maximization
An instance of monotone submodular maximization is given by , where is a matroid and is a setfunction on which is normalized, monotone and submodular.
The optimum of the instance is
Because is monotone, the maximum is always attained at some basis.
We say that a set is an approximate solution if . Thus . We say that an algorithm has an approximation ratio of (or, simply that an algorithm provides an approximation) if it produces an approximate solution on every instance.
3 The algorithm
Our nonoblivious local search algorithm is shown in Algorithm 1. The algorithm takes the following input parameters:

A matroid , given as a ground set and a membership oracle for some collection of independent sets, which returns whether or not for any .

A monotone submodular function , given as a value oracle that returns for any .

An upper bound on the curvature of . The case in which the curvature of is unrestricted corresponds to .

A convergence parameter .
Throughout the paper, we let denote the rank of and .
The algorithm starts from an initial greedy solution , and proceeds by repeatedly exchanging one element in the current solution for one element not in , with the aim of obtaining an improved independent set . In both the initial greedy phase and the following local search phase, the quality of the solution is measured not with respect to , but rather with respect to an auxiliary potential function (as we discuss shortly, we in fact must use an estimate for ), which is determined by the rank of and the value of the curvature bound .
We give a full definition of in Section 4. The function is determined by a sequence of coefficients depending on the upper bound on the curvature of . Evaluating the function exactly will require an exponential number of value queries to . Nonetheless, in Section 6 we show how to modify Algorithm 1 by using a random sampling procedure to approximate . The resulting algorithm has the desired approximation guarantee with high probability and runs in polynomial time.
At each step we require that an improvement increase by a factor of at least . This, together with the initial greedy choice of , ensures that Algorithm 1 converges in time polynomial in and , at the cost of a slight loss in its locality gap. In Section 7 we describe how the small resulting loss in the approximation ratio can be recovered, both in the case of Algorithm 1, and in the randomized, polynomialtime variant we consider in Section 6.
4 The auxiliary objective function
We turn to the remaining task needed for completing the definition of Algorithm 1: giving a definition of the potential function . The construction we use for will necessarily depend on , but because we have fixed an instance, we shall omit this dependence from our notation, in order to avoid clutter.
4.1 Definition of
We now present a definition of our auxiliary potential function . Our goal is to give extra value to solutions that are robust with respect to small changes. That is, we would like our potential function to assign higher value to solutions that retain their quality even when some of their elements are removed by future iterations of the local search algorithm. We model this general notion of robustness by considering a random process that obtains a new solution from the current solution by independently discarding each element of with some probability. Then we use the expected value of to define our potential function
It will be somewhat more intuitive to begin by relating the marginals of to the marginals of , rather than directly defining the values of and . We begin by considering some simple properties that we would like to hold for the marginals, and eventually give a concrete definition of , showing that it has these properties.
Let be some subset of and consider an element . We want to define the marginal value . We consider a twostep random process that first selects a probability from an appropriate continuous distribution, then a set by choosing each element of independently with some probability . We then define so that is the expected value of over the random choice of .
Formally, let be a continuous distribution supported on with density given by . Then, for each , we consider the probability distribution on given by
Note that this is simply the expectation over our initial choice of of the probability that the set is obtained from by randomly selecting each element of independently with probability . Furthermore, for any and any , if then .
Given the distributions , we shall construct a function so that
(1) 
That is, the marginal value is the expected marginal gain in obtained when is added to a random subset of , obtained by the twostep experiment we have just described.
We can obtain some further intuition by considering how the distribution affects the values defined in (1). In the extreme example in which with probability 1, we have and so behaves exactly like the original submodular function. Similarly, if with probability 1, then for all , and so is in fact a linear function. Thus, we can intuitively think of the distribution as blending together the original function with some other “more linear” approximations of , which have systematically reduced curvature. We shall see that our choice of distribution results in a function that gives the desired locality gap.
It remains to show that it is possible to construct a function whose marginals satisfy (1). In order to do this, we first note that the probability depends only on and . Thus, if we define the values
for all , then we have . We adopt the convention that if either or is negative. Then, we consider the function given by:
(2) 
The marginals of this function are given by
The term evaluates to
We conclude that
The values used to define in (2) can be computed from the following recurrence, which will also play a role in our analysis of the locality gap of Algorithm 1.
Lemma 1.
, and for and ,
Proof.
For the base case, we have
The proof of the general case follows from a simple integration by parts:
In future proofs, we shall also need the following upper bound on the sum of the coefficients appearing in (2). Define
Lemma 2.
For all ,
Proof.
Expanding the definition of we obtain
where in the penultimate line, we have used Euler’s Beta integral:
whenever are positive integers. ∎
4.2 Properties of
We now show that our potential function shares many basic properties with .
Lemma 3.
The function is normalized, monotone, submodular and has curvature at most .
Proof.
From (2) we have . Thus, is normalized. Additionally, (1) immediately implies that is monotone, since the monotonicity of implies that each term is nonnegative. Next, suppose that and . Then from (1), we have
where the inequality follows from submodularity of . Thus, is submodular. Finally, for any set and any element , we have
where the inequality follows from the bound on the curvature of , and the second equation from setting in (1). Thus, has curvature at most . In fact, it is possible to show that for any given , has slightly lower curvature than , corresponding to our intuition that the distribution blends together and various functions of reduced curvature. For our purposes, however, an upper bound of is sufficient. ∎
Finally, we note that for any , it is possible to bound the value relative to .
Lemma 4.
For any ,
Proof.
Let and define for . The formula (1) implies that
Summing the resulting inequalities for to , we get
The lower bound then follows from the fact that both and are normalized, so .
4.3 Approximating via Sampling
Evaluating exactly requires evaluating on all subsets , and so we cannot compute directly without using an exponential number of calls to the value oracle . We now show that we can efficiently estimate by using a sampling procedure that requires evaluating on only a polynomial number of sets . In Section 6, we show how to use this sampling procedure to obtain a randomized variant of Algorithm 1 that runs in polynomial time.
We have already shown how to construct the function , and how to interpret the marginals of as the expected value of a certain random experiment. Now we show that the direct definition of in (2) can also be viewed a the result of a random experiment.
For a set , consider the distribution on given by
Then, recalling the direct definition of , we have:
We can estimate to any desired accuracy by sampling from the distribution . Let be independent random samples from . Then, we define:
(3) 
Lemma 5.
Choose , and set
Then,
Proof.
We use the following version of Hoeffding’s bound.
Fact (Hoeffding’s bound).
Let be i.i.d. nonnegative random variables bounded by , and let be their average. Suppose that . Then, for any ,
5 Analysis of Algorithm 1
We now give a complete analysis of the runtime and approximation performance of Algorithm 1. The algorithm has two phases: a greedy phase and a local search phase. Both phases are guided by the auxiliary potential function defined in Section 4. As noted in Section 4.3, we cannot, in general, evaluate in polynomial time. We postpone concerns dealing with approximating by sampling until the next section, and in this section suppose that we are given a value oracle returning for any set . We then show that Algorithm 1 requires only a polynomial number of calls to the oracle for . In this way, we can present the main ideas of the proofs without a discussion of the additional parameters and proofs necessary for approximating by sampling. In the next section we use the results of Lemma 5 to implement an approximate oracle for in polynomial time, and adapt the proofs given here to obtain a randomized, polynomial time algorithm.
Consider an arbitrary input to the algorithm. Let be the solution returned by Algorithm 1 on this instance and be an optimal solution to this instance. It follows directly from the definition of the standard greedy algorithm and the type of exchanges considered by Algorithm 1 that is a base. Moreover, because is montone, we may assume without loss of generality that is a base, as well. We index the elements of by using the following lemma of Brualdi [3].
Fact (Brualdi’s lemma).
Suppose are two bases in a matroid. There is a bijection such that for all , is a base. Furthermore, is the identity on .
The main difficulty in bounding the locality ratio of Algorithm 1 is that we must bound the ratio stated in terms of , by using only the fact that is locally optimal with respect to . Thus, we must somehow relate the values of and . In the following theorem relates the values of and on arbitrary bases of a matroid. Later, we shall apply this theorem to and to obtain an approximation guarantee both for Algorithm 1 and for the randomized variant presented in the next section.
Theorem 1.
Let and be any two bases of , and suppose that we index the elements of so that , where is the bijection guaranteed by Brualdi’s lemma. Then,
Proof.
The proof of Theorem 1 involves a chain of two inequalities and one equation, each of which we shall prove as a separate lemma. We consider the quantity:
First, we shall show in Lemma 6 that
and then in Lemma 7 that
for any . Combining these inequalities, we obtain
(4) 
Next, we show in Lemma 8 that
(5) 
We now prove each of the necessary claims.
Lemma 6.
For all ,
Proof.
The proof relies on the characterization of the marginals of given in (1). We consider two cases: and . If then the submodularity of implies
On the other hand, when , we must have by the definition of . Then,
where the second equality follows from the fact that if then . ∎
Lemma 7.
For any ,
Proof.
Our proof relies only on the submodularity and curvature of . Let , and . Furthermore, let be the set of indices such that . We separate the sum on the lefthand side into two parts, based on whether or not .
The first part of the sum is
where the final inequality follows from submodularity of . Next, using , we get
where the final line follows from the fact that has curvature at most and .
The second part of the sum is
where the first inequality follows from the fact that has curvature at most and the second inequality from submodularity of .
Putting both parts together, we deduce
where in the second inequality we have used and together with submodularity of . ∎
Lemma 8.
(6) 
Proof.
The proof relies primarily on the recurrence given in Lemma 1 for the values used to define . From the characterization of the marginals of given in (1) we have
Each subset appears in the expectation. Specifically, if then we have the term , and if then we have the term . Therefore the coefficient of in the lefthand side of (6) is thus given by
According to the recurrence for given in Lemma 1, the righthand side vanishes unless , in which case it is , or , in which case it is . ∎
We are now ready to prove this section’s main claim, which gives bounds on both the approximation ratio and complexity of Algorithm 1.
Theorem 2.
Algorithm 1 is a approximation algorithm, requiring at most evaluations of .
Proof.
We first consider the number of evaluations of required by Algorithm 1. The initial greedy phase requires evaluations of , as does each iteration of the local search phase. Thus, the total number of evaluations of required by Algorithm 1 is , where is the number of improvements applied in the local search phase. We now derive an upper bound on .
Let be the maximum value attained by on any independent set in . Algorithm 1 begins by setting to a greedy solution , and each time it selects an improved solution to replace by, we must have
Thus, the number of improvements that Algorithm 1 can apply is at most