Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings

# Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings

Aneta Neumann University of Adelaide, email: aneta.neumann@adelaide.edu.auHasso Plattner Institute, email: francesco.quinzan@hpi.de    Francesco Quinzan Hasso Plattner Institute, email: francesco.quinzan@hpi.de
###### Abstract

We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this requires no problem relaxation to continuous domains and it maintains a constant-factor approximation guarantee in the problem size. In the case of a single knapsack, our analysis suggests that the standard greedy can be used in non-monotone settings.

Additionally, we study this problem in a dynamic setting, by which knapsacks change during the optimization process. We modify our greedy algorithm to avoid a complete restart at each constraint update. This modification retains the approximation guarantees of the static case.

We evaluate our results experimentally on a video summarization and sensor placement task. We show that our proposed algorithm competes with the state-of-the-art in static settings. Furthermore, we show that in dynamic settings with tight computational time budget, our modified greedy yields significant improvements over starting the greedy from scratch, in terms of the solution quality achieved.

\newfloatcommand

capbtabboxtable[][\FBwidth] \floatsetup[table]capposition=top

## 1 Introduction

Many artificial intelligence and machine learning tasks can be naturally approached by maximizing submodular objectives. Examples include subset selection [9], document summarization [23], video summarization [24] and action recognition [33]. Submodular functions are set functions that yield a diminishing return property: adding an element to a smaller set helps more than adding it to a larger set. This property fully characterizes the notion of submodularity.

Practical applications often require additional side constraints on the solution space, determined by possible feasibility conditions. These constraints can be complex [23, 26, 31]. For instance, when performing video summarization tasks, we might want to select frames that only show certain group of objects, and that fulfill costs constraints based on qualitative factors, such as resolution and luminance.
In this paper, we study general multiple knpasack constraints. Given a set of solutions, a -knapsack constraint consists of linear cost functions on the solution space and corresponding weights . A solution is then feasible if the corresponding costs do not exceed the weights. In this paper, we study the problem of maximizing a submodular function under a -knapsack constraint.

Sometimes, real-world optimization problems involve dynamic and stochastic constraints [6]. For instance, resources and costs can exhibit slight frequent changes, and this leads to changes of the underlying space of feasible solutions. Various optimization problems have been studied under dynamically changing constraints, i.e., facility location problems [15], target tracking [11], and other submodular maximization problems for machine learning [4]. Motivated by these applications, we also study the problem of maximizing a submodular function under a -knapsack constraint, when the set of feasible solutions changes online.

#### Literature Overview.

Khuller, Moss and Naor [16] show that a simple greedy algorithm achieves a -approximation guarantee, when maximizing a modular function with a single knapsack constraint. They also propose a modified greedy algorithm that achieves a -approximation. Sviridenko [29] shows that this modified greedy algorithm yields a -approximation guarantee for monotone submodular functions under a single knapsack constraint. Its run time is function evaluations.

Lee et al. [22] give a -approximation local search algorithm, for maximizing a non-monotone submodular function under multiple knapsack constraints. Its run time is polynomial in the problem size and exponential in the number of constraints. Fadaei, Fazli and Safari [10] propose an algorithm that achieves a ()-approximation algorithm for non-monotone functions. This algorithm requires to compute fractional solutions of a continuous extension of the value oracle function . Chekuri, Vondrák and Zenklusen [5] improve the approximation ratio to , in the case of knapsacks. Kulik, Schachnai and Tamir [21] give a -approximation algorithm when is monotone and a -approximation algorithm when the function is non-monotone. Again, their method uses continuous relaxations of the discrete setting. Fantom can be considered the state-of-the-art algorithm for non-monotone submodular maximization [25]. It can handle intersections of a variety of constraints. In the case of multiple knapsack constraints, it achieves a -approximation in run time.

Submodular optimization problems with dynamic costs constraints, including knapsack constraints, are investigated in Rostapoor et al. [28]. They show that a Pareto optimization approach can implicitly deal with dynamically changing constraint bounds, whereas a simple adaptive greedy algorithm fails.

#### Our Contribution.

Many of the aforementioned algorithmic results, despite having polynomial run time, seem impractical for large input applications. Following the analysis outlined in [7, 13, 16], we propose a simple, practical discrete algorithm to maximize a submodular function under multiple knapsack constraints. This algorithm, which we call the -Greedy, achieves a -approximation guarantee on this problem, with expressing the curvature of , and a constant. It requires at most function evaluations.
We also propose a robust variation of our -Greedy, which we call k-Greedy, to handle dynamic changes in the feasibility region of the solution space. We show that, in contrast to the -Greedy, this algorithm maintains a -approximation.

We demonstrate experimentally that our algorithms yield good performance in practise, with two real-world scenarios. First, we consider a video summarization task, which consists of selecting representative frames of a given video [26, 25]. We also consider a sensor placement problem, that asks to select informative thermal stations over a large territory [17].
We show that the -Greedy yields superior performance to commonly used algorithms for the static video summarization problem. We then perform experiments in dynamic settings with both scenarios, to show that the robust variation yields improvement in practise.

The paper is structured as follows. In Section 2 we introduce basic definitions and define the problem. In Section 3 we define the algorithms. We present the theoretical analysis in Section 4, and the experimental framework in Section 5. The experimental results are discussed in Section 6 and Section 7. We conclude in Section 8.

## 2 Preliminaries

### 2.1 Submodularity and Curvature

We assume that value oracle functions are submodular, as in the follwing definition.

###### Definition \thetheorem (Submodularity)

Given a finite set , a set function is submodular if one of the following three equivalent conditions hold

1. for all and every we have that ;

2. for all we have that ;

3. for all and such that (s.t.) we have that .

To see that the conditions of Definition 2.1 are equivalent if is finite, see, i.e., Nemhauser et al. [27]. Note that condition in Definition 2.1 intuitively captures a notion of diminishing returns.
For any submodular function and sets , we define the marginal value of with respect to as . Note that, if only attains non-negative values, it holds that for all .

Our approximation guarantees use the notion of curvature, a parameter that bounds the maximum rate with which a submodular function changes. We say that a submodular function has curvature if the value does not change by a factor larger than when varying , for all . This parameter was first introduced by [7] and later revisited in [3]. We use the following definition of curvature, which slightly generalizes that proposed in Friedrich et al. [13].

###### Definition \thetheorem (Curvature)

Consider a submodular function . The curvature is the smallest scalar s.t.

 ρω((S∪Ω)∖{ω})≥(1−α)ρω(S∖{ω}),

for all and .

Note that it always hold and that all monotone submodular functions have curvature always bounded as . It follows that all submodular functions with negative curvature are non-monotone.

### 2.2 Problem Description

The problem of maximizing a submodular function under multiple knapsack constraints can be formalized as follows.

###### Problem \thetheorem

Let be a submodular function.111We assume that , and that is non-constant. Consider linear cost functions ,222We assume that for all . and corresponding weights , for all . We search for a set , such that .

In this setting, one has knapsacks and wishes to find an optimal set of items such that its total cost, expressed by the functions , does not violate the capacity of each knapsack. Note that the same set might have different costs for different knapsacks.
We denote with a set of feasible solutions obtained by the requirement for all , for all . For a -knapsack we define the quantity

 χ(ci,Wi)\coloneqqsup{j∈[k]:all sets of size at most j are feasible}.

In other words, the value is the largest value s.t. all sets of cardinality are feasible solutions in the constraints .
Note that in the case of a single knapsack, if for all , then Problem 2.2 consists of maximizing a submodular function under a cardinality constraint, which is known to be NP-hard.

In our analysis we always assume that the following reduction holds.

###### Reduction \thetheorem

For Problem 2.2 we may assume that there exists a point such that for all , and for all . Furthermore, we may assume that for all , for all .

If Reduction 2.2 does not hold, one can remove all points that violate one of the constraints, and add a point without altering the function . Intuitively, Reduction 2.2 requires that each singleton, except for one, is feasible for all knapsack constraints. This ensures that is always feasible in all constraints, since , and the output of the algorithm consists of at least one point. Furthermore, the point ensures that the solution quality never decreases throughout a greedy optimization process, until a non-feasible solution is reached.

We study a dynamic setting of Problem 2.2, in which weights are repeatedly updated throughout the optimization process, while the corresponding cost functions remain unchanged. In this setting, we assume that an algorithm queries a function to retrieve the weights which are, sometimes, updated online. We assume that weights changes occur independently of the optimization process and algorithmic operations. Furthermore, we assume that Reduction 2.2 holds for each dynamic update.

## 3 Algorithms

We approach Problem 2.2 with a discrete algorithm based on a greedy technique, commonly used to maximize a submodular function under a single knapsack constraint (see [16, 32]). For starters, our algorithm defines the following partition of the objective space:

1. the set containing all s.t. for all ;

2. the complement containing all s.t. for all .

The -Greedy optimizes over the set , with a greedy update that depends on all cost functions . After finding a greedy approximate solution , the -Greedy finds the optimum among feasible subsets of . This step can be performed with a deterministic search over all possible solutions, since the space always has bounded size. The -Greedy outputs the set with highest -value among or the maximum among the singletons.
Note that the -Greedy algorithm depends on a parameter . As expressed in Theorem 4, the parameter sets a trade-off between solution quality and run time. For small , Algorithm 1 yields better approximation guarantee and worse run time, then for large . This is due to the fact that the size of depends on this parameter. In practise, the parameter allows to find the right trade-off between solution quality and run time, depending on available resources. Note that in the case of a single knapsack constraint, for the -Greedy is equivalent to the greedy algorithm studied in [16].

We modify the -Greedy to handle dynamic constraints by which weights change overtime. This algorithm, which we refer to as the k-Greedy, is presented in Algorithm 2. It consists of two subroutines, which we call the greedy rule and the update rule.
The greedy rule of the k-Greedy uses the same greedy update as the -Greedy does: At each step, find a point that maximizes the marginal gain over maximum cost, and add to the current solution, if the resulting set is feasible in all knapsacks.
The update rule allows to handle possible changes to the weights, without having to restart the algorithm from scratch. Following the notation of Algorithm 2, if new weights are given, then the k-Greedy iteratively removes points from the current solution, until the resulting set yields and . This is motivated by the following facts:

1. every set is feasible in both the old and the new constraints;

2. every set yields the same approximation guarantee in both constraints;

3. every set is s.t. for all , for all .

All three conditions are necessary to ensure that the approximation guarantee is maintained.

Note that the update rule in Algorithm 2 does not backtrack the execution of the algorithm until the resulting solution is feasible in the new constraint, and then adds elements to it. For instance, consider a set of five items under a single knapsack , with the cost function defined as

 c(vi)={1,if 1≤i≤3;2,if 3

and weight . For a given submodular value oracle function , i.e., , suppose that at some point during the optimization process a new weight is given, and suppose that, at that point, a solution of size is reached. Then, in this case, and the update rule removes a point from even if the weight increases. This holds since there exists a set of two elements that is feasible in the new constraint, but not in the old one.

We remark that combining the -Greedy with simple backtracking, may result into losing the approximation guarantee, as discussed in [28, Theroem 3].

## 4 Approximation Guarantees

We prove that Algorithm 1 yields a strong approximation guarantee, when maximizing a submodular function under knapsack constraints in the static case. This part of the analysis does not consider dynamic weights updates. We use the notion of curvature as in Definition 2.1. The following theorem holds. {theorem} Let be a submodular function with curvature , suppose that knapsacks are given. For all , the -Greedy is a -approximation algorithm for Problem 2.2. Its run time is . A proof of this result is given in the Appendix. Note that if the function is monotone, then the approximation guarantee given in Theorem 4 matches well-known results [16]. We remark that non-monotone functions with bounded curvature are not uncommon in practise. For instance, all cut functions of directed graphs are non-monotone, submodular and have curvature , as discussed in [13].

We perform the run time analysis for the -Greedy in dynamic settings, in which weights change over time. The following theorem holds. {theorem} Consider Algorithm 2 optimizing as submodular function with curvature , and knapsacks . Suppose that at some point during the optimization process new weights are given. Consider the set , and define

Then after additional run time the -Greedy finds a -approximate optimal solution in the new constraints, for a fixed parameter . Note that the Theorem 4 yields the same theoretical approximation guarantee as the 4. Hence, if dynamic updates occur at a slow pace, than it is possible to obtain similar results combining the -Greedy with an appropriate restart strategy. However, we show in Section 7 that there is significant advantage in using the k-Greedy in settings when frequent noisy constraints updates occur. Furthermore, we remark that the same analysis for the

## 5 Applications

#### Video Summarization.

Determinantal Point Process (DPP) is a probabilistic model, the probability distribution function of which can be characterized as the determinant of a matrix. More formally, consider a sample space , and let be a positive semidefinite matrix. We say that defines a DPP on , if the probability of an event is given by the formula

 PL(S)=det(LS)det(L+I),

where is the submatrix of indexed by the elements in , and is the identity matrix. For a survey on DPPs and their applications see [20].

We model this framework with a matrix that describes similarities between pairs of frames. Intuitively, if describes the similarity between two frames, then the DPP prefers diversity.
In this setting, we search for a set of features s.t. is maximal, among sets of feasible solutions defined in terms of a knapsack constraint. Since is positive semidefinite, then the function is submodular [20].

#### Sensor Placement.

The maximum entropy sampling problem consists of choosing the most informative subset of random variables subject to side constraints. In this work, we study the problem of finding the most informative set among given Gaussian time series.

Let be a unique time series as described above. We consider the corresponding variation series , defined as . We compute the covariance matrix of the time series and , the coefficients of which we estimate as

 cov(¯¯¯¯X,¯¯¯¯Y)=1m−1m∑t=1(¯¯¯¯¯Xt−E[ ¯¯¯¯X ])(¯¯¯¯Yt−E[ ¯¯¯¯Y ]).

The entropy of a subset of time series is then given by the formula

 f(S)=1+ln(2π)2|S|+12lndetΣ(S)

for any indexing set on the variation series, where returns the determinant of the sub-matrix of indexed by . It is well-known that the function is non-monotone and submodular. Its curvature is bounded as , with its largest eigenvalue [30, 17, 13].
We consider the problem of maximizing the entropy under a partition matroid constraint. This additional side constraint requires a upper-bounds on the number of sensors that can be chosen in given geographical areas. Specifically, we partition the total number of time series in seven sets, based on the continent in which corresponding stations are located. Under this partition set, we then have seven independent cardinality constraints, one for each continent.

## 6 Static Experiments

The aim of these experiments is to show that the -Greedy yields good performance in comparison with Fantom [24], which is a popular algorithm for non-monotone submodular objectives under complex sets of constraints. We consider video summarization tasks as in Section 5.

Let be the matrix describing similarities between pairs of frames, as in Section 5. Following [14], we parametrize as follows. Given a set of frames, let being the feature vector of the -th frame. This vector encodes the contextual information about frame and its representativeness of other items. Then the matrix can be paramterized as

 Li,j=zTiWTWzj,

with is a hidden representation of , and parameters. We use a single-layer neural network to train the parameters . We consider movies from the Frames Labeled In Cinema dataset [12]. Each movie has frames and generated ground summaries consisting of frames each.

We select a representative set of frames, by maximizing the function under additional quality feature constraints, viewed as multiple knapsacks. Hence, this task consists of maximizing a non-monotone submodular function under multiple knapsack constraints.
We run the -Greedy and Fantom algorithms on each instance, until no remaining point in the search space yields improvement on the fitness value, without violating side constraints. We then compare the resulting run time and approximation guarantee. Since Fantom depends on a parameter [24], then we perform three sets of experiments for , , and . The parameter for the -Greedy is always set to . We have no indications that a lower yields improved solution quality on this set of instances.

Results for the run time and approximation guarantee are displayed in Figure LABEL:fig:resultsStatic. We clearly see that the -Greedy outperforms Fantom in terms of solution quality. Furthermore, the run time of Fantom is orders of magnitude worse that that of our -Greedy. This is probably due to the fact that the Fantom requires a very low density threshold to get to a good solution on these instances.

## 7 Dynamic Experiments

The aim of these experiments is to show that, when constraints quickly change dynamically, the robust k-Greedy significantly outperforms the -Greedy with a restart policy, that re-sets the optimization process each time new weights are given. To this end, we simulate a setting where updates change dynamically, by introducing controlled posterior noise on the weights. At each update, we run the -Greedy from scratch, and let the k-Greedy continue without a restart policy. We consider two set of dynamic experiments.

### 7.1 The Maximum Entropy Sampling Problem

We consider the problem of maximizing the entropy under a partition matroid constraint. This additional side constraint requires an upper bound on the number of sensors that can be chosen in given geographical areas. Specifically, we partition the total number of time series in seven sets, based on the continent in which the corresponding stations are located. Under this partition set, we then have seven independent cardinality constraints, one for each continent.
We use the Berkeley Earth Surface Temperature Study, which combines billion temperature reports from preexisting data archives. This archive contains over unique stations from around the word. More information on the Berkeley Earth project can be found in [2]. Here, we consider unique time series defined as the average monthly temperature for each station. Taking into account all data between years -, we obtain time series from the corresponding stations. Our experimental framework follows along the lines of [13].

In our dynamic setting, for each continent, a given parameter is defined as a percentage value of the overall number of stations available on that continent, for all . We let parameters vary over time, as to simulate a setting where they are updated dynamically. This situation could occur when operational costs slightly vary overtime. We initially set all parameters to use of the available resources, and we introduce a variation of these parameters at regular intervals, according to , a Gaussian distribution of mean and variance , for all .
We consider various choices for the standard deviation , but also various choices for the time span between one dynamic update and the next one (the parameter ). For each choice of and , we consider a total of sequences of changes. We perform statistical validation using the Kruskal-Wallis test with % confidence. In order to compare the results, we use the Bonferroni post-hoc statistical procedure. This method is used for multiple comparisons of a control algorithm against two or more other algorithms. We refer the reader to [8] for more detailed descriptions of these statistical tests.

We compare the results in terms of the solution quality achieved at each dynamic updateby the -Greedy and the k-Greedy. We summarize our results in the Table 1 (left) as follows. The columns correspond to the results for -Greedy and the k-Greedy respectively, along with rmean value, standard deviation, and statistical comparison. The symbol is equivalent to the statement that the algorithm labelled as significantly outperformed the other one.
Table 1 (left) shows that the k-Greedy has a better performance than the -Greedy algorithm with restarts, when dynamic changes occur, especially for the highest frequencies . This shows that the k-Greedy is suitable in settings when frequent dynamic changes occur. The -Greedy yields improved performance with lower frequencies, but it under-perform the k-Greedy on our dataset.
Figure 1 (left) shows the solution quality values achieved by the -Greedy and the k-Greedy, for different choices of the standard deviation . Again, we observe that the k-Greedy finds solutions that have better quality than the -Greedy with restarts. Even though the k-Greedy in some cases aligns with the -Greedy with restarts, the performance of the k-Greedy is clearly better than that of the simple -Greedy with restarts.

### 7.2 Determinantal Point Processes

We conclude with a dynamic set of experiments on a video summarization task as in Section 5. We define the corresponding matrix using the quality-diversity decomposition, as proposed in [18]. Specifically, we define the coefficients of this matrix as

 Li,j=q(i)k(i,j)q(j),

with representing the quality of the -th frame and being the diversity between the -th and -th frame. For the quality measure, we use the byte size of the -th frame and a fixed parameter as follows

 q(i)=exp{12θbi}.

We choose such that the resulting eigenvalues of are of the form . This ensures that the resulting function is non-negative.
For the diversity measure , we compare commonly used descriptors for pictures. We use Color2, Color3, SIFT256, SIFT512 and GIST feature vectors, as described in [19]. Let . For each , let be the -feature vector of the -th frame. Then the diversity measure is defined as

 k(i,j)=exp⎧⎨⎩−∑f∈F||vfi−vfj||22σf⎫⎬⎭,

with a parameter for this feature222In our setting we combine the parameters and .. To learn these parameters we use the Markov Chain Monte Carlo (MCMC) method (see [1]).

We use movie clips from the Frames Labeled In Cinema dataset [12]. We use 16 movies with 150-550 frames each to learn the parameters and one test movie with approximately 400 frames for our experiments. For each movie, we generate 5-10 samples (depending on the total amount of frames) of sets with 10-20 frames as training data. We then use MCMC on the training data to learn the parameters for each movie. When testing the -Greedy and the k-Greedy, we use the sample median of the trained parameters.

In this set of experiments, we consider a constraint by which the set of selected frames must not exceed a memory threshold. We define a cost function as the sum of the size of each frame in . As each frame comes with its own size in memory, choosing the best frames under certain memory budget is equivalent to maximizing a submodular function under a linear knapsack constraint.
The weight is given range , with respect to the total weight , and it is updated dynamically throughout the optimization process, according to a Gaussian distribution , for a given variance . This settings simulates a situation by which the overall available memory exhibits small frequent variation.

We select various parameter choices for the standard deviation , and the frequency with which a dynamic update occurs. We investigate the settings = , , , and = , , , , . Each combination of and carries out dynamic changes. Again, we validate our results using the Kruskal-Wallis test with % confidence. To compare the obtained results, we apply the Bonferroni post-hoc statistical test [8].

The results are presented in the Table 1 (right). We observe that the k-Greedy yields better performance than the -Greedy with restarts when dynamic changes occur. Similar findings are obtained when comparing a different standard deviation choice = , , . Specifically, for the highest frequency , the k-Greedy achieves better results by approximately one order of magnitude.
Figure 1 (right) shows the solution quality values obtained by the k-Greedy and the -Greedy, as the frequency is set to . It can be observed that, for = , , , the k-Greedy significantly outperforms the -Greedy with restarts, for almost all updates.

## 8 Conclusion

Many real-world optimization problems can be approached as submodular maximization with multiple knapsack constraints (see Problem 2.2). Previous studies for this problem show that it is possible to approach this problem with a variety of heuristics. These heuristics often involve a local search, and require continuous relaxations of the discrete problem, and they are impractical. We propose a simple discrete greedy algorithm (see Algorithm 1) to approach this problem, that has polynomial run time and yields strong approximation guarantees for functions with bounded curvature (see Definition 2.1 and Theorem 4).

Furthermore, we study the problem of maximizing a submodular function, when knapsack constraints involve dynamic components. We study a setting by which the weights of a given sets of knapsack constraints change overtime. To this end, we introduce a robust variation of our -Greedy algorithm that allows for handling dynamic constraints online (see Algorithm 2). We prove that this operator allows to maintain strong approximation guarantees for functions with bounded curvature, when constraints change dynamically (see Theorem 4).

We show that,in static settings, Algorithm 1 competes with Fantom, which is a popular algorithm for handling these constraints (see Figure LABEL:fig:resultsStatic).
Furthermore, we show that the k-Greedy is useful in dynamic settings. To this end, we compare the k-Greedy with the -Greedy combined with a restart policy, by which the optimization process starts from scratch at each dynamic update. We observe that the k-Greedy yields significant improvement over a restart in dynamic settings with limited computational time budget (see Figure 1 and Table 1).

## References

• [1] Raja Hafiz Affandi, Emily B. Fox, Ryan P. Adams, and Benjamin Taskar, ‘Learning the parameters of determinantal point process kernels’, in Proc. of ICML, pp. 1224–1232, (2014).
• [2] Berkeley Earth. The berkeley earth surface temperature study. http://www.berkeleyearth.org, 2019.
• [3] Andrew An Bian, Joachim M. Buhmann, Andreas Krause, and Sebastian Tschiatschek, ‘Guarantees for greedy maximization of non-submodular functions with applications’, in Proc. of ICML, pp. 498–507, (2017).
• [4] Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye, ‘Max-sum diversification, monotone submodular functions, and dynamic updates’, ACM Transactions on Algorithms, 13(3), 41:1–41:25, (July 2017).
• [5] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen, ‘Submodular function maximization via the multilinear relaxation and contention resolution schemes’, SIAM Journal of Computing, 43(6), 1831–1879, (2014).
• [6] Raymond Chiong, Thomas Weise, and Zbigniew Michalewicz, Variants of evolutionary algorithms for real-world applications, Springer, 2012.
• [7] Michele Conforti and Gérard Cornuéjols, ‘Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the rado-edmonds theorem’, Discrete Applied Mathematics, 7(3), 251–274, (1984).
• [8] Gregory W. Corder and Dale I. Foreman, Nonparametric statistics for non-statisticians: a step-by-step approach, Wiley, 2009.
• [9] Abhimanyu Das and David Kempe, ‘Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection’, Journal of Machine Learning Research, 19, 3:1–3:34, (2018).
• [10] Salman Fadaei, MohammadAmin Fazli, and MohammadAli Safari, ‘Maximizing non-monotone submodular set functions subject to different constraints: Combined algorithms’, Operations Research Letters, 39(6), 447 – 451, (2011).
• [11] Mahyar Fazlyab, Santiago Paternain, Victor M. Preciado, and Alejandro Ribeiro, ‘Interior point method for dynamic constrained optimization in continuous time’, in ACC, pp. 5612–5618. IEEE, (2016).
• [12] FLIC Dataset. Frames labeled in cinema.
• [13] Tobias Friedrich, Andreas Göbel, Frank Neumann, Francesco Quinzan, and Ralf Rothenberger, ‘Greedy maximization of functions with bounded curvature under partition matroid constraints’, in Proc. of AAAI, pp. 2272–2279, (2019).
• [14] Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha, ‘Diverse sequential subset selection for supervised video summarization’, in Proc. of NIPS, pp. 2069–2077, (2014).
• [15] Sanjay Dominik Jena, Jean-François Cordeau, and Bernard Gendron, ‘Solving a dynamic facility location problem with partial closing and reopening’, Computers & Operations Research, 67(C), 143–154, (March 2016).
• [16] Samir Khuller, Anna Moss, and Joseph Naor, ‘The budgeted maximum coverage problem’, Information Processing Letters, 70(1), 39–45, (1999).
• [17] Andreas Krause, Ajit Paul Singh, and Carlos Guestrin, ‘Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies’, Journal of Machine Learning Research, 9, 235–284, (2008).
• [18] Alex Kulesza and Ben Taskar, ‘Structured determinantal point processes’, in Proc. of NIPS, pp. 1171–1179, (2010).
• [19] Alex Kulesza and Ben Taskar, ‘k-dpps: Fixed-size determinantal point processes’, in Proc. of ICML, pp. 1193–1200, (2011).
• [20] Alex Kulesza and Ben Taskar, ‘Determinantal point processes for machine learning’, Foundations and Trends in Machine Learning, 5(2-3), 123–286, (2012).
• [21] Ariel Kulik, Hadas Shachnai, and Tami Tamir, ‘Approximations for monotone and nonmonotone submodular maximization with knapsack constraints’, Mathematics of Operations Research, 38(4), 729–739, (2013).
• [22] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko, ‘Non-monotone submodular maximization under matroid and knapsack constraints’, in Proc. of STOC, pp. 323–332, (2009).
• [23] Hui Lin and Jeff Bilmes, ‘Multi-document summarization via budgeted maximization of submodular functions’, in Proc. of NAACL-HLT, pp. 912–920, (2010).
• [24] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi, ‘Fast constrained submodular maximization: Personalized data summarization’, in Proc. of ICML, pp. 1358–1367, (2016).
• [25] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi, ‘Fast constrained submodular maximization: Personalized data summarization’, in Proc. of ICML, pp. 1358–1367, (2016).
• [26] Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause, ‘Streaming non-monotone submodular maximization: Personalized video summarization on the fly’, in Proc. of AAAI, pp. 1379–1386, (2018).
• [27] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher, ‘An analysis of approximations for maximizing submodular set functions - I’, Mathematical Programming, 14(1), 265–294, (1978).
• [28] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich, ‘Pareto optimization for subset selection with dynamic cost constraints’, in Proc. of AAAI, pp. 2354–2361, (2019).
• [29] Maxim Sviridenko, ‘A note on maximizing a submodular set function subject to a knapsack constraint’, Operation Research Letters, 32(1), 41–43, (2004).
• [30] Maxim Sviridenko, Jan Vondrák, and Justin Ward, ‘Optimal approximation for submodular and supermodular optimization with bounded curvature’, Mathematics of Operations Research, 42(4), (2017).
• [31] Qilian Yu, Easton Li Xu, and Shuguang Cui, ‘Streaming algorithms for news and scientific literature recommendation: Monotone submodular maximization with a $d$ -knapsack constraint’, IEEE Access, 6, 53736–53747, (2018).
• [32] Haifeng Zhang and Yevgeniy Vorobeychik, ‘Submodular optimization with routing constraints’, in Proc. of AAAI, eds., Dale Schuurmans and Michael P. Wellman, pp. 819–826, (2016).
• [33] Jinging Zheng, Zhuolin Jiang, Rama Chellappa, and P. Jonathon Phillips, ‘Submodular attribute selection for action recognition in video’, in Proc. of NIPS, pp. 1341–1349, (2014).

## Appendix: Missing Proofs

### Proof of Theorem 4

We first observe that the at each step the -Greedy performs at most steps, and that at each steps it requires at most fitness evaluations, for a total of run time. This holds, because Line 1 requires calls to the value oracle function, whereas Lines 6-12 of Algorithm 1 require at most calls to the value oracle function.

We now prove the -Greedy yields the desired approximation guarantee. To this end, we define

 V1\coloneqq{e∈V:cj(e)≤λWj/k ∀j∈[k]}andV2\coloneqqV∖V1.

Without loss of generality we assume that . We denote with the weight of each knapsack. Let be a solution at time step . Let be the smallest index, such that

1. for all , for all ;

2. there exists such that .

In other words, is the first point it time such that the new greedy solution does not fulfills all knapsacks at the same time. We first prove that either the solution or the point yields a good approximation guarantee of .

To simplify the notation, we define and , for all . We have that it holds

 fσt−1(\textscopt∩V1) ≤∑e∈\textscopt∖σt−1fσt−1(e) (1) =∑e∈(\textscopt∩V1)∖σt−1supjcj(e)fσt−1(e)supjcj(e) (2) ≤ftsupjc(vt)∑e∈\textscopt∖σt−1supjcj(e) (3) ≤λWsupjc(vi)ft (4)

where (1) follows from the assumption that is submodular; (3) follows from (2) due to the greedy choice of Algorithm 1; (4) uses the fact that , together with the fact that for all , for all . Rearranging yields

 ft≥supjcj(vi)λW(f(σt−1∪(\textscopt∩V1))−f(σt−1)). (5)

To continue with the proof, we consider the following lemma, which follows along the lines of Lemma 1 in [13].

###### Lemma \thetheorem

Following the notation introduced above, define the set . Then for any subset it holds

 f(σt∪T)≥f(T)+(1−α)f(σt)−(1−α)∑j∈[t]∩Jfj.

for all .

#### Proof:

From the definition of curvature we have that

 f(σi∪\textscopt)−f(σi−1∪\textscopt)≥(1−α)(f(σi)−f(σi−1)).

for all . It follows that

 f(σt∪\textscopt) ≥f(σt−1∪\textscopt)+(1−α)(f(σt)−f(σt−1)) (6) ≥f(∅∪\textscopt)+t∑j=1(1−α)(f(σj)−f(σj−1)) (7) =f(\textscopt)+(1−α)(f(σt)−f(∅)), (8)

where (7) follows by iteratively applying (6) to the , and (8) follows by taking the telescopic sum.
Note that Lemma Proof of Theorem 4 yields , since if and only if the function is monotone. Combining this observation with (5) yields

 ft ≥supjcj(vi)sup(1,α)λWf(\textscopt∩V1)−supjcj(vi)λW∑i∈[t−1]fi,

where we have simply used the telescopic sum over the . Defining for all we can write the inequality above as

 sup(1,α)λWsupjcj(vt)xt+sup(1,α)∑i∈[t−1]xi≥1. (9)

We conclude the proof by showing that any array of solutions with coefficients that fulfils the LP as in (9) yields

 ∑t∈[r]xt≥∑t∈[r]supjcj(vt)sup(1,α)λW∏i∈[t−1](1−supjcj(vi)λW). (10)

In order to prove (10), since it holds for all , we can simplify our setting, by studying the system

 sup(1,α)λWsupjcj(vt)xt+sup(1,α)∑i∈[t−1]xi=1. (11)

This is due to the fact that the sum of the coefficients of any solution of (11) are upper-bounded by the sum of the coefficients of a solution of (10). We continue with the following simple lemma.

###### Lemma \thetheorem

Let be a solution of the LP as in (11). Than it holds

 xt≥supjcj(vt)sup(1,α)λW∏i∈[t−1](1−supjcj(vi)λW),

for all .

#### Proof:

Define for all . Then the LP as in (11) can be written as

 xtct=1−sup(1,α)t−1∑i=1xt.

By defining for all , we have that , and we obtain the following recurrent relation , for all . This is a recurrent linear equation with solutions

 yt=t−1∏j=1(1−sup(1,α)cj).

The claim follows, by substituting in the equation above.
Hence, we have that it holds

 f(σr) =f(\textscopt∩V1)∑txt (12) ≥f(\textscopt∩V1)∑tsupjcj(vt)sup(1,α)kW∏i∈[t−1](1−supjcj(vi)λW) (13) ≥f(\textscopt∩V1)sup(1,α)⎛⎝1−∏i∈[r](1−supjcj(vi)λW)⎞⎠ (14) ≥f(\textscopt∩V1)sup(1,α)⎛⎝1−exp⎧⎨⎩−∑i∈[r]supjcj(vi)λW⎫⎬⎭⎞⎠, (15)

where (12) holds by taking the telescopic sum; (13) follows from Lemma Proof:; (14) follows via standard calculations; (15) follows because . Consider an index such that . We have that it holds

 f(σr) ≥1sup(1,α)⎛⎝1−exp⎧⎨⎩−sup(1,α)∑i∈[r]supjcj(vi)λW⎫⎬⎭⎞⎠f(\textscopt∩V1) ≥1sup(1,α)⎛⎝1−exp⎧⎨⎩−sup(1,α)∑i∈[r]cℓ(vi)λW⎫⎬⎭⎞⎠f(\textscopt∩V1) ≥1sup(1,α)(1−e−1/λ)f(\textscopt∩V1).

We conclude by proving that Algorithm 1 yields the the desired approximation guarantee. To this end, we observe that

 argmax{U⊆V2:cj(U)≤Wj ∀j∈[k]}f(U)≥f(\textscopt∩V2).

Hence, following the notation of Algorithm 1, and denoting with the point with maximum -value among the singletons, it follows that

 argmaxU∈V{f(σr−1),fU} ≥13(f(σr−1)+2fU) ≥13(f(σr−1)+f(vt)+fU) ≥13f(σr+fU) ≥13sup(1,α)(1−e−1/λ)f(\textscopt∩V1)+f(\textscopt∩V2) ≥13sup(1,α)(1−e−1/λ)f(\textscopt),

where the last inequality follows from submodularity. The claim follows.

### Proof of Theorem 4

We prove that the claim holds after weights updates were given since the beginning of the optimization process. We denote with the -th new sequence of dynamic weights. Furthermore, let be as the current solution at time step , after new dynamic weights were given. Let be the smallest index, such that

1. , for all , and for all ;

2. , for some .

In other words, is the first point it time, after the -th weight update, such that the set maximizing the greedy step is not feasible. We prove that either the solution or the point yields the desired approximation guarantee. Note that at each step, the -Greedy with the update operator requires at most calls to the value oracle function. Since for all and , then the -Greedy with restarts requires additional run time to construct the solution .

In order to prove the desired lower-bound on the approximation guarantee, we prove that the solution is equal to a solution of the same size constructed by the -Greedy starting from the empty set, under side constraints specified by . We then show that either or yield the desired approximation guarantee, by readily applying Theorem 4.

Define and denote with the points of sorted in the order that they were added to the solution. Define the sets

 ^σℓ,t=∅for t=0;^σℓ,t={vℓ,1,…,vℓ,t}for t∈[dℓ];

Note that, according to this definition, it holds for all . We prove that the solution is equal to a solution if the same size constructed by the -Greedy from scratch, by showing that it holds

 vℓ,t</