Non-submodular Function Maximization subject to a Matroid Constraint, with Applications

Non-submodular Function Maximization
subject to a Matroid Constraint, with Applications

Kashayar Gatmiry Sharif University, kgatmiry@ce.sharif.edu
Max Planck Institute for Software Systems, manuelgr@mpi-sws.org
Manuel Gomez Rodriguez Sharif University, kgatmiry@ce.sharif.edu
Max Planck Institute for Software Systems, manuelgr@mpi-sws.org
Abstract

The standard greedy algorithm has been recently shown to enjoy approximation guarantees for constrained non-submodular nondecreasing set function maximization. While these recent results allow to better characterize the empirical success of the greedy algorithm, they are only applicable to simple cardinality constraints. In this paper, we study the problem of maximizing a non-submodular nondecreasing set function subject to a general matroid constraint. We first show that the standard greedy algorithm offers an approximation factor of , where is the submodularity ratio of the function and is the rank of the matroid. Then, we show that the same greedy algorithm offers a constant approximation factor of , where is the generalized curvature of the function. In addition, we demonstrate that these approximation guarantees are applicable to several real-world applications in which the submodularity ratio and the generalized curvature can be bounded. Finally, we show that our greedy algorithm does achieve a competitive performance in practice using a variety of experiments on synthetic and real-world data.

1 Introduction

The problem of maximizing a nondecreasing set function emerges in a wide variety of important real-world applications such as feature selection, sparse modeling, experimental design, graph inference and link recommendation, to name a few. If the set function of interest is nondecreasing and satisfies a natural diminishing property called submodularity111A set function is submodular iff it satisfies that for all and , where is the ground set., the problem is (relatively) well understood. For example, under a simple cardinality constraint, it is known that the standard greedy algorithm enjoys an approximation factor of  (Nemhauser et al., 1978; Vondrák, 2008). Moreover, this constant factor has been improved using the curvature (Conforti and Cornuéjols, 1984; Vondrák, 2010) of a submodular function, which quantifies how close is a submodular function to being modular. Under a general matroid constraint, a variation of the standard greedy algorithm yields a -approximation (Fisher et al., 1978) and, more recently, it has been shown that there exist polynomial time algorithms that yield a -approximation (Calinescu et al., 2011; Filmus and Ward, 2012).

However, there are many important applications, from subset selection (Altschuler et al., 2016), sparse recovery (Candes et al., 2006) and dictionary selection (Das and Kempe, 2011) to experimental design (Krause et al., 2008), where the corresponding set function is not submodular. In this context, Bian et al. (2017) have shown that, under a cardinality constraint, the standard greedy algorithm enjoys an approximation factor of of , where is the submodularity ratio (Das and Kempe, 2011) of the set function, which characterizes how close is the function to being submodular, and is the curvature of the set function. Very recently, Harshaw et al. (2019) have also shown that there is no polynomial algorithm with better guarantees. However, the problem of maximizing a non submodular nondecreasing set function subject to a general matroid constraint has only been studied very recently by Chen et al. (2018), who have shown that a randomized version of the standard greedy algorithm enjoys an approximation factor of , where is again the submodularity ratio. In this paper, we make the following contributions:

  • We show that the standard greedy algorithm  yields an approximation factor of , where is the submodularity ratio and is the rank of the matroid. While this approximation factor is worse than the one by Chen et al. (2018), this result shows that the standard greedy algorithm, which is deterministic and simpler, does enjoy non trivial theoretical guarantees.

  • We show that the standard greedy algorithm yields a constant approximation factor of , where is the generalized curvature of the function, as defined in previous work (Lehmann et al., 2006; Hassani et al., 2017; Bogunovic et al., 2018).

  • We show that the approximation guarantees from our theoretical analysis is applicable in a wide range of real-world applications, including tree structured Gaussian graphical model estimation and visibility maximization in link recommendation, in which the generalized curvature can be bounded.

  • We show that the standard greedy algorithm does achieve a competitive performance in practice using a variety of experiments on synthetic and real-world data.

Here, we focus on -weakly submodular functions (Bian et al., 2017; Das and Kempe, 2011), however, we would like to acknowledge that there are other types of non-submodular set functions that have been studied in the literature in recent years, namely, approximately submodular functions (Krause et al., 2008), weak submodular functions (Borodin et al., 2014), set functions with restricted and shifted submodularity (Du et al., 2008), and -approximately submodular functions (Horel and Singer, 2016). Moreover, it would be interesting to extend our study to robust non-submodular function maximization (Bogunovic et al., 2018).

Notation. We use capital italic letters to denote sets and we refer to as the ground set. We use to represent a set function and define the marginal gain function of each subset as . Whenever is a singleton, we use the symbol instead of for simplicity.

2 Preliminaries

In this section, we start by revisiting the definitions of matroids and -weakly submodular functions (Bian et al., 2017; Das and Kempe, 2011). Then, we define -submodular functions, a subclass of -weakly submodular functions defined in terms of the generalized curvature  (Lehmann et al., 2006; Hassani et al., 2017; Bogunovic et al., 2018). Finally, we establish a relationship between -submodular functions and set functions representable as a difference between submodular functions.

Matroids are combinatorial structures that generalize the notion of linear independence in matrices. More formally, a matroid can be defined as follows (Fujishige, 2005; Schrijver, 2003): {definition} A matroid is a pair defined over the ground set and a family of sets (the independent sets) that satisfies three axioms:

  1. [leftmargin=0.5cm,noitemsep,nolistsep]

  2. Non-emptiness: the empty set .

  3. Heredity: if and , then .

  4. Exchange: if , and , then there exists such that .

The rank of a matroid is the maximum size of an independent set in the matroid.

A -weakly submodular function is defined in terms of the submodularity ratio : {definition} A set function is -weakly submodular if

(1)

where the largest such that the above inequality is true is called submodularity ratio. Submodular functions have submodularity ratio . A -submodular function is defined in terms of the generalized curvature : {definition} A set function is -submodular if, for any and subsets ,

(2)

where the smallest such that the above inequality is true is called the generalized curvature. As shown very recently (Bogunovic et al., 2018; Halabi et al., 2018), there is a relationship between -submodular functions and -weakly submodular functions: {proposition} Given a set function with generalized curvature , then it has submodularity ratio . Moreover, the following proposition establishes a relationship between -submodular functions and (nondecreasing) set functions representable as a difference between submodular functions222Note that any set function can be expressed as a difference between two submodular functions (Narasimhan and Bilmes, 2012).: {proposition} Given a set function , where and are nondecreasing submodular functions and let be the smallest constant333Note that, if is nondecreasing, such constant will always exist. such that

(3)

for all and . Then, has generalized curvature .

Proof.

Let . Then, we have:

where the first inequality follows from the definition of and the second and third inequalities follow from the submodularity of and the monotonicity of , respectively. ∎

In general, set functions representable as a difference between submodular functions cannot be approximated in polynomial time (Iyer and Bilmes, 2012), however, the above result identifies a particular class of these functions for which the standard greedy algorithm achieves approximation guarantees.

Remarks. The notion of -submodularity is fundamentally different from -approximate submodularity (Horel and Singer, 2016). More specifically, there exist -approximately submodular functions that are not -submodular for any arbitrarily close to . Define the following set functions and over :

Then, it can be readily shown that for , is submodular and, if , then , which implies that is -approximately submodular. However,

(4)

as . Therefore, the generalized curvature of can approach arbitrarily, which proves our claim.

3 Approximation Guarantees

In this section, we show that the standard greedy algorithm (Nemhauser et al., 1978) (Algorithm 1) enjoys approximation guarantees at maximizing set non-submodular nondecreasing functions under a matroid constraint with rank , \ie,

(5)

More specifically, we first show that the greedy algorithm offers an approximation factor of for -weakly submodular functions whenever . Here, note that, whenever , we can check all of the sets in our matroid using brute force without increasing the time complexity of the greedy algorithm and, hence, we omit those cases from our analysis. Then, we show that the greedy algorithm enjoys an approximation factor of for -submodular functions independently of the rank of the matroid.

0:  Ground set , matroid , non-submodular nondecreasing set function
0:  Set of items
1:  
2:  
3:  while  do
4:     
5:     repeat
6:        
7:               % Item is considered
8:     until 
9:       % Item is selected
10:     
11:  end while
12:  return
Algorithm 1 Greedy algorithm

3.1 -weakly submodular functions

Our main result is the following theorem, which shows that the greedy algorithm achieves an approximation factor that depends on the rank of matroid: {theorem} Given a ground set , a matroid with rank and a non-decreasing -weakly submodular set function . Then, the greedy algorithm, summarized in Algorithm 1, returns a set such that

(6)

where is the optimal value.

Proof.

Let be the set of items selected by the greedy algorithm in the first steps, assume , and define . The core of our proof lies on the following key Lemma (proven in Appendix A), which shows that, if is large enough, then will be smaller than a factor of : {lemma} Suppose that , where . Then, it holds that

(7)

where

(8)

Given the above Lemma, we proceed as follows. First, we note that the function is increasing with respect to , it is decreasing with respect to and by definition. Therefore, it follows that

(9)

and thus

(10)

Second, we note that the function is decreasing. The reason is that , because for , we have . Therefore, using , for ,

(11)

where we used the lower bound of from Eq. 9. Next, using that and for all , we have that

(12)

where the last inequality follows from Eq. 11.

Now, if we combine Eq. 11 and Eq. 12, we conclude that . Hence, if we could use Eq. 7 up until step (the last step444Since is monotone nondecreasing, has cardinality equal to the dimension of the matroid, \ie, ), then we would conclude that . However, since we can only use Eq. 7 whenever , we conclude that there exists some such that and . Thus,

where we have also used Eq. 10. Finally, since is monotone nondecreasing, it follows that

which concludes the proof. ∎

{corollary}

The greedy algorithm enjoys an approximation guarantee of if and if .

Remarks. We would like to acknowledge that the randomized algorithm recently introduced by Chen et al. (2018) enjoys better approximation guarantees at maximizing -weakly submodular functions, however, we do think that the above result has some value. More specifically:

  • Our theoretical result shows that the greedy algorithm, which is deterministic and simpler, does enjoy non trivial theoretical guarantees. These theoretical guarantees supports its strong empirical performance in several applications (\eg, tree-structured Gaussian graphical model estimation).

  • If , the approximation factors of the greedy algorithm and the algorithm by Chen et al. (2018) are both and of the same order.

  • The proof technique used in Theorem 3.1 is novel and it may be useful in proving better approximation factors of other randomized algorithms for maximizing non submodular set functions.

3.2 -submodular functions

Our main result is the following theorem, which shows that the greedy algorithm achieves an approximation factor that is independent of the rank of the matroid: {theorem} Given a ground set , a matroid and a nondecreasing -submodular set function . Then, the greedy algorithm returns a set such that , where is the optimal value.

Proof.

Let be the optimal set of items and be the set of items selected by the greedy algorithm. Moreover, let be the items considered by the algorithm in the first steps, be the items selected by the greedy algorithm in the first steps in order of their consideration, and be the items in considered by the greedy algorithm in the first steps also in order of their consideration.

According to the definition of the greedy algorithm, adding any element from to violates the matroid constraint (otherwise, that element should have been picked by the greedy algorithm). Thus, , which implies . Moreover, implies , therefore, . However, and are both independent sets of the matroid , because they are both feasible solutions. As a result, it follows that , , and thus . Moreover, this implies that is considered in the greedy algorithm at some point before . This means that, at the point that the greedy picks , does not have a higher marginal gain than , \ie, . Hence, we can write

where, in the first inequality, we have used the monotonicity of and, in the second inequality, we have used the -submodularity of . This concludes the proof. ∎

Remarks. We would like to highlight that the above proof differs significantly from that of Theorem 2.1 in Nemhauser et al. (1978), which is significantly more involved. More specifically, in our proof, we cannot apply proposition 2.2 in Nemhauser et al. (1978) because the decreasing monotonicity of the marginal gains of the added elements fails to hold in the absence of the submodularity condition. As a result, our proof does not resort to linear program duality and it is generalizable for the case of having an intersection of matroids instead of just one.

4 Applications

In this section, we consider several real-world applications and their corresponding -weakly submodular and -submodular functions and matroid constraints. We demonstrate that the submodularity ratio and the generalized curvature can be bounded and, as a result, our approximation guarantees are applicable.

4.1 -weakly submodular functions

Tree-structured Gaussian graphical models. Gaussian graphical models (GGMs) are widely used in many applications, \eg, gene regulatory networks (Friedman et al., 2000; Friedman, 2004; Irrthum et al., 2010). A GGM is typically characterized by means of the sparsity pattern of the inverse of its covariance matrix . Here, we look at maximum likelihood estimation (MLE) problem for tree-structured GMMs (Tan et al., 2010) from the perspective of -weakly submodular maximization. More specifically, given a set of -dimensional samples and a lower and upper bound on the eigenvalues of the true covariance matrix , \ie, , we can readily rewrite the MLE problem as:

(13)

with

where is the set of all trees with vertices, is the set of positive definite matrices whose sparsity pattern is based on a tree in and note that, for a fixed , the optimization problem that defines is convex with respect to . Then, the following Theorem (proven by  Elenberg et al. (2016)) and Proposition (proven in Appendix B) characterize the submodularity ratio of :

{theorem}

Let be a concave function with curvature based bounds

and define the set function as

Then, the submodularity ratio of is . {proposition} The function satisfies Theorem 4.1 for and and, as a result, its submodularity ratio is .

Social welfare allocation. Social welfare maximization has been studied extensively in the context of combinatorial auctions (Feige, 2009; Feige and Vondrak, 2006; Mirrokni et al., 2008; Vondrák, 2008). In a popular variant of this problem, given a set of items and players, each of them with a monotone utility function , the goal is to partition into disjoint subsets that maximize the social welfare . Since the partition of the items can be viewed as a matroid constraint, \ie, for any valid partitioning, each item is assigned to exactly one player, the above formulation reduces to the problem of maximizing a set function subject to a matroid constraint. In this context, Calinescu et al. (Calinescu et al., 2011) has recently proposed a polynomial time algorithm with approximation guarantees whenever are submodular functions. Here, since the sum of -weakly submodular functions is also -weakly submodular, our results imply that our greedy algorithm enjoys approximation guarantees whenever are -weakly submodular. This includes natural utility functions such as , where is the amount of item by player and is a strongly concave function that satisfies the curvature based bounds in Theorem 4.1.

LPs with combinatorial constraints. In a recent work (Bian et al., 2017), Bian et al. have shown that linear programs with combinatorial constraints, which appeared in the context of inventory optimization, can be reduced to maximizing -weakly submodular functions. More specifically, define a set function , where is a polytope and is a given vector. Then, they have shown that has non-zero submodularity ratio that depends on the polytope . However, they only impose simple cardinality constraints on . Our results imply that the greedy algorithm also enjoys approximation guarantees under a general matroid constraint .

4.2 -submodular functions

Visibility optimization in link recommendation. In the context of viral marketing, a recent line of work (Karimi et al., 2016; Upadhyay et al., 2018; Zarezade et al., 2018, 2017), has developed a variety of algorithms to help users in a social network maximize the visibility of the stories they post. More specifically, these algorithms find the best times for these users, the broadcasters, to share stories with her followers so that they elicit the greatest attention. Motivated by this line of work and recent calls for fairness of exposure in social networks (Biega et al., 2018; Singh and Joachims, 2018), we consider the following visibility optimization problem in the context of link recommendation (Lü and Zhou, 2011): given a set of candidate links provided by a link recommendation algorithm, the goal is to find the subset of these links that maximize the average visibility that a set of broadcasters achieve with respect to the (new) followers induced by these links555The followers the broadcasters would gain if were added., under constraints on the maximum number of links per broadcaster. More specifically, we can show that this problem reduces to maximizing a -submodular function (average visibility) under a partition matroid constraint (number of links per broadcaster), where the generalized curvature can be analytically bounded. For space constraints, we defer most of the technical details to Appendix C, which also provide additional motivation for the problem, and here we just state the main results.

Formally, let measure the visibility a broadcaster achieves with respect to the (new) followers induced by the links as the average number of stories posted by her that lie within the top positions of those followers’ feeds over time. Here, for simplicity, each follower’s feed ranks stories in inverse chronological order, as in previous work (Karimi et al., 2016; Zarezade et al., 2018, 2017). Moreover, assume that666These assumptions are natural in most practical scenarios, as argued in Appendix C.:

  • [noitemsep,nolistsep,leftmargin=0.8cm]

  • the intensities (or rate) at which broadcasters posts stories and followers receive stories are -bounded, \ie, ; and,

  • at each time , the intensity at which each broadcaster posts is lower than a fraction of each of her followers’ feeds intensity, where is a given constant.

Then, we can characterize the generalized curvature of the average visibility these broadcasters achieve using the following Proposition: {proposition} The generalized curvature of the average top visibility is given by

(14)

where, if , then .

Sensor placement with submodular costs. In sensor placement optimization (Iyer and Bilmes, 2012; Krause and Guestrin, 2005; Krause et al., 2008), the goal is typically maximizing the mutual information between the chosen locations and the unchosen ones , \ie, while simultaneously minimizing a cost function associated with the chosen locations. Since the mutual information is a submodular function and the costs are also often submodular, \eg, there is typically a discount when purchasing sensors in bulk, the problem can be reduced to maximizing a set function representable as difference between submodular functions, \ie, , where is a given parameter. Moreover, there may be constraints on the amount of sensors in a given geographical area (Powers et al., 2015), which can be represented as partition matroid constraints.

Then, we can characterize the generalized curvature of using the following Proposition, which readily follows from Proposition 2: {proposition} Let be the mininum constant for which

Then, the generalized curvature of is . The above proposition assumes that the marginal gain of the cost function times is always smaller than a fraction of the marginal gain of the mutual information, which in turns imposes an upper bound on the given parameter for which -submodularity holds.

(a) Negative log-likelihood
(b) Edge errors
Figure 1: Negative log-likelihood and edge errors achieved the greedy algorithm (Greedy; green) and a MST-based state of the art method (MST-based; orange) by Tan et al. (Tan et al., 2010). Each point corresponds to the average value across repetitions.

More applications. As pointed out by Iyer and Bilmes (2012), set functions representable as a difference between submodular functions emerged in more applications, from feature selection and discriminatively structure graphical models and neural computation to probabilistic inference. In all those applications, the inequality in Eq. 3, which must be satisfied for the functions to be -submodular, have a natural interpretation. For example, in feature selection with a submodular cost model for the features, it just imposes an upper bound on the penalty parameter that controls the tradeoff between the predictive power of a feature and its cost, similarly as in the case of sensor placement.

5 Experiments

In this section, our goal is to show that greedy algorithm, given by Algorithm 1, does achieve a competitive performance in practice. To this aim, we perform experiments on synthetic and real-world experiments in two of the applications introduced in Section 4, namely, tree-structure Gaussian graphical model estimation and visibility maximization in link recommendation777We will release an open-source implementation of our algorithm with the final version of the paper..

5.1 Tree-structured Gaussian graphical models

Data description and experimental setup. We experiment with random trees888To generate a random tree, we start with an empty graph and add edges sequentially. At each step, we pick two of the graph’s current connected components uniformly at random and connect them with an edge, choosing the end points uniformly at random. The process continues until there is only one connected component. with vertices and edge weights . We compare the performance of our estimation procedure, based on Algorithm 1, with the minimum spanning tree (MST) based estimation procedure by Tan et al. (2010), which is the state of the art method, in terms of two performance metrics: negative log-likelihood and edge errors . Here, for our estimation procedure, at each iteration of the greedy algorithm, we solve the corresponding convex problem in Eq. 13 using CVXOPT (Diamond and Boyd, 2016). Moreover, we run the estimation procedures using different number of samples and, for a fix number of samples, we repeat the experiment times to obtain reliable estimates of the performance metrics.

(a)
(b)
(c)
Figure 2: Average visibility with achieved by the greedy algorithm (GP), the three heuristics (CP, UP, CUP) and the trivial baseline (Random) using Twitter data. The solid horizontal line shows the median visibility and the box limits correspond to the 25%-75% percentiles.

Performance. Figure 1 summarizes the results in terms of the two performance metrics, which show that both methods perform comparably, \ie, in terms of negative log-likelihood, our method beats the MST-based method slightly while, in terms of edge error, the MST-based method beats ours. The main benefit of using our greedy algorithm for this application is that, in contrast with MST, it provides optimality guarantees in terms of likelihood maximization. That being said, our goal here is to demonstrate that the greedy algorithm, which is a generic algorithm, can achieve competitive performance in a structured estimation problem for which a specialized algorithm exists.

5.2 Visibility optimization in link recommendation

Data description and experimental setup. We experiment with data gathered from Twitter as reported in previous work (Cha et al., 2010), which comprises user profiles, (directed) links between users, and (public) tweets. The follow link information is based on a snapshot taken at the time of data collection, in September 2009. Here, we focus on the tweets posted during a two month period, from July 1, 2009 to September 1, 2009, in order to be able to consider the social graph to be approximately static, sample a set of users uniformly at random, record all the tweets they posted.

We compare the performance of the greedy algorithm with a trivial baseline that picks edge uniformly at random and the same three heuristics we used in the experiments with synthetic data. Then, for , we repeat the following procedure times: (i) we pick uniformly at random a set of users as broadcasters; (ii) for each broadcaster , we pick uniformly at random a set of of their followers; (iii) we record all tweets not posted by broadcasters in in the feeds of the users in ; and (iv) we run the greedy algorithm, the heuristics, and the trivial baseline and record the sets each provides. Here, we run all methods using empirical estimates of the relevant quantities, \ie, using Eq. 57 (see Appendix C.4) and using maximum likelihood, computed using the tweets posted during the first month and evaluate their performance using empirical estimates of using the tweets posted during the second month.

Solution quality. Figure 2 summarizes the results by means of box plots, which show that the greedy algorithm consistently beats all heuristics and the trivial baseline. Moreover, we did experiment with other parameters settings (\eg, , and ) and found our method to be consistently superior to alternatives. Appendix C.5 contains additional results using synthetic data.

6 Conclusions

We have shown that a simple variation of the standard greedy algorithm offers approximation guarantees at maximizing non-submodular nondecreasing set functions under a matroid constraint. Moreover, we have identified a particular type of -weakly submodular functions, which we called -submodular functions, for which the greedy algorithm offer a stronger approximation factor that is independent of the rank of the matroid. In addition, we have shown that these approximation guarantees are applicable in a variety several real-world applications, from tree-structure Gaussian graphical models and social welfare allocation to link recommendation.

Our work opens up several interesting avenues for future work. For example, a natural step would be to include the curvature of a set function in our theoretical analysis, as defined in Bian et al. (2017). Moreover, it would be very interesting to analyze the tightness of the approximation guarantees and obtaining better upper and lower bounds for and , respectively, or even unbiased estimates, for and in the applications we considered. Finally, it would be worth to extend our analysis to other notions of approximate submodularity (Borodin et al., 2014; Du et al., 2008; Horel and Singer, 2016; Krause et al., 2008).

References

  • J. Altschuler, A. Bhaskara, G. Fu, V. Mirrokni, A. Rostamizadeh, and M. Zadimoghaddam (2016) Greedy column subset selection: new bounds and distributed algorithms. In Proceedings of the 33rd International Conference on Machine Learning, Cited by: §1.
  • L. Backstrom, E. Bakshy, J. M. Kleinberg, T. M. Lento, and I. Rosenn (2011) Center of attention: how facebook users allocate attention across friends.. In ICWSM, Cited by: §C.1.
  • A. A. Bian, J. M. Buhmann, A. Krause, and S. Tschiatschek (2017) Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 498–507. Cited by: §1, §2, §4.1, §6.
  • A. J. Biega, K. P. Gummadi, and G. Weikum (2018) Equity of attention: amortizing individual fairness in rankings. In SIGIR, Cited by: §C.1, §4.2.
  • I. Bogunovic, J. Zhao, and V. Cevher (2018) Robust maximization of non-submodular objectives. Cited by: item (i), §1, §2, §2.
  • A. Borodin, D. T. M. Le, and Y. Ye (2014) Weakly submodular functions. arXiv preprint arXiv:1401.6697. Cited by: §1, §6.
  • G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing 40 (6), pp. 1740–1766. Cited by: §1, §4.1.
  • E. J. Candes, J. K. Romberg, and T. Tao (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59 (8), pp. 1207–1223. Cited by: §1.
  • M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi (2010) Measuring user influence in twitter: the million follower fallacy. In ICWSM, Cited by: §5.2.
  • L. Chen, M. Feldman, and A. Karbasi (2018) Weakly submodular maximization beyond cardinality constraints: does randomization help greedy?. In ICML, Cited by: item (i), §1, item (ii), §3.1.
  • M. Conforti and G. Cornuéjols (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete applied mathematics 7 (3), pp. 251–274. Cited by: §1.
  • M. Crawford (2015) The world beyond your head: on becoming an individual in an age of distraction. Farrar, Straus and Giroux. Cited by: §C.1.
  • A. Das and D. Kempe (2011) Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning, Cited by: §1, §2.
  • A. De, U. Upadhyay, and M. Gomez-Rodriguez (2019) Temporal point processes. Technical report Saarland University. Cited by: §C.2.
  • S. Diamond and S. Boyd (2016) CVXPY: a python-embedded modeling language for convex optimization. The Journal of Machine Learning Research 17 (1), pp. 2909–2913. Cited by: §5.1.
  • D. Du, R. L. Graham, P. M. Pardalos, P. Wan, W. Wu, and W. Zhao (2008) Analysis of greedy approximations with nonsubmodular potential functions. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 167–175. Cited by: §1, §6.
  • E. R. Elenberg, R. Khanna, A. G. Dimakis, and S. Negahban (2016) Restricted strong convexity implies weak submodularity. arXiv preprint arXiv:1612.00804. Cited by: §4.1.
  • U. Feige and J. Vondrak (2006) Approximation algorithms for allocation problems: improving the factor of 1-1/e. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 667–676. Cited by: §4.1.
  • U. Feige (2009) On maximizing welfare when utility functions are subadditive. SIAM Journal on Computing 39 (1), pp. 122–142. Cited by: §4.1.
  • Y. Filmus and J. Ward (2012) A tight combinatorial algorithm for submodular maximization subject to a matroid constraint. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 659–668. Cited by: §1.
  • M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey (1978) An analysis of approximations for maximizing submodular set functions—ii. In Polyhedral combinatorics, pp. 73–87. Cited by: §1.
  • N. Friedman, M. Linial, I. Nachman, and D. Pe’er (2000) Using bayesian networks to analyze expression data. Journal of computational biology 7 (3-4), pp. 601–620. Cited by: §4.1.
  • N. Friedman (2004) Inferring cellular networks using probabilistic graphical models. Science 303 (5659), pp. 799–805. Cited by: §4.1.
  • S. Fujishige (2005) Submodular functions and optimization. Vol. 58, Elsevier. Cited by: §2.
  • M. Gomez-Rodriguez, K. P. Gummadi, and B. Schoelkopf (2014) Quantifying information overload in social media and its impact on social contagions. In ICWSM, Cited by: §C.1.
  • M. E. Halabi, F. Bach, and V. Cevher (2018) Combinatorial penalties: which structures are preserved by convex relaxations?. Cited by: §2.
  • C. Harshaw, M. Feldman, J. Ward, and A. Karbasi (2019) Submodular maximization beyond non-negativity: guarantees, fast algorithms, and applications. In ICML, Cited by: §1.
  • H. Hassani, M. Soltanolkotabi, and A. Karbasi (2017) Gradient methods for submodular maximization. In NIPS, Cited by: item (i), §2.
  • N. Hodas and K. Lerman (2012) How visibility and divided attention constrain social contagion. In SocialCom, Cited by: §C.1.
  • T. Horel and Y. Singer (2016) Maximization of approximately submodular functions. In Advances in Neural Information Processing Systems, pp. 3045–3053. Cited by: §1, §2, §6.
  • A. Irrthum, L. Wehenkel, P. Geurts, et al. (2010) Inferring regulatory networks from expression data using tree-based methods. PloS one 5 (9), pp. e12776. Cited by: §4.1.
  • R. Iyer and J. Bilmes (2012) Algorithms for approximate minimization of the difference between submodular functions, with applications. arXiv preprint arXiv:1207.0560. Cited by: §2, §4.2, §4.2.
  • J. Kang and K. Lerman (2015) Vip: incorporating human cognitive biases in a probabilistic model of retweeting. In ICSC, Cited by: §C.1.
  • M. Karimi, E. Tavakoli, M. Farajtabar, L. Song, and M. Gomez-Rodriguez (2016) Smart broadcasting: do you want to be seen?. In KDD, Cited by: §C.1, §C.2, §C.2, §4.2, §4.2.
  • A. Krause and C. E. Guestrin (2005) Near-optimal nonmyopic value of information in graphical models. In UAI, Cited by: §4.2.
  • A. Krause, A. Singh, and C. Guestrin (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. Journal of Machine Learning Research 9 (Feb), pp. 235–284. Cited by: §1, §4.2, §6.
  • B. Lehmann, D. Lehmann, and N. Nisan (2006) Combinatorial auctions with decreasing marginal utilities. Games and Economic Behavior 55 (2), pp. 270–296. Cited by: item (i), §2.
  • K. Lerman and T. Hogg (2014) Leveraging position bias to improve peer recommendation. PloS one 9 (6), pp. e98914. Cited by: §C.1.
  • L. Lü and T. Zhou (2011) Link prediction in complex networks: a survey. Physica A: statistical mechanics and its applications. Cited by: §C.1, §4.2.
  • V. Mirrokni, M. Schapira, and J. Vondrák (2008) Tight information-theoretic lower bounds for welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM conference on Electronic commerce, pp. 70–77. Cited by: §4.1.
  • M. Narasimhan and J. A. Bilmes (2012) A submodular-supermodular procedure with applications to discriminative structure learning. arXiv preprint arXiv:1207.1404. Cited by: footnote 2.
  • G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher (1978) An analysis of approximations for maximizing submodular set functions—i. Mathematical programming 14 (1), pp. 265–294. Cited by: §1, §3.2, §3.
  • T. Powers, D. W. Krout, and L. Atlas (2015) Sensor selection from independence graphs using submodularity. In 2015 18th International Conference on Information Fusion (Fusion), pp. 333–337. Cited by: §4.2.
  • A. Schrijver (2003) Combinatorial optimization: polyhedra and efficiency. Vol. 24, Springer Science & Business Media. Cited by: §2.
  • A. Singh and T. Joachims (2018) Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2219–2228. Cited by: §C.1, §4.2.
  • N. Spasojevic, Z. Li, A. Rao, and P. Bhattacharyya (2015) When-to-post on social networks. In KDD, Cited by: §C.1.
  • V. Y. Tan, A. Anandkumar, and A. S. Willsky (2010) Learning gaussian tree models: analysis of error exponents and extremal structures. IEEE Transactions on Signal Processing 58 (5), pp. 2701–2714. Cited by: Figure 1, §4.1, §5.1.
  • U. Upadhyay, A. De, and M. Gomez-Rodriguez (2018) Deep reinforcement learning of marked temporal point processes. In NeurIPS, Cited by: §C.1, §4.2.
  • J. Vondrák (2008) Optimal approximation for the submodular welfare problem in the value oracle model. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 67–74. Cited by: §1, §4.1.
  • J. Vondrák (2010) Submodularity and curvature: the optimal algorithm. Cited by: §1.
  • A. Zarezade, A. De, U. Upadhyay, H. Rabiee, and M. Gomez-Rodriguez (2018) Steering social activity: a stochastic optimal control point of view. JMLR. Cited by: §C.1, §C.2, §C.2, §C.2, §4.2, §4.2.
  • A. Zarezade, U. Upadhyay, H. Rabiee, and M. Gomez-Rodriguez (2017) Redqueen: an online algorithm for smart broadcasting in social networks. In WSDM, Cited by: §C.1, §C.2, §C.2, §C.2, §4.2, §4.2.

Appendix A Proof of Lemma 3.1

Let be the optimal set of items and be the set of items selected by the greedy algorithm. Moreover, let be the items selected by the greedy algorithm in the first steps and be the items in considered by the greedy algorithm also in the first steps in order of their consideration in the algorithm. Then, we first state the following facts, that we will use throughout the proof:

  • [noitemsep,nolistsep,leftmargin=0.8cm]

  • and have cardinality equal to the rank of the matroid, \ie, . This follows from the monotonicity of the function .

  • For any , it readily follows that . This follows from the proof of Theorem 3.2.

  • There is a subset with cardinality such that . This follows from the definition of a matroid.

  • For all ,

    (15)

    This holds because, at step , and are not considered yet and greedy selects . Therefore, must have a higher marginal gain than .

  • Let , with . Then, for all and for all ,

    (16)

    This holds because none of the items in are considered by the greedy algorithm in the first steps.

Now, we can use Eq. 15 and the fact that for to upper bound for all :

(17)

Using the same technique, if we choose an arbitrary order on the elements of , we can also upper bound for all :

(18)

Then, it follows from Eqs. 17 and 18 that:

(19)

In what follows, we will upper bound the sum in the above equation. To this end, we assume there exists such that it holds that . We will specify the value of later. Then, it follows that and, using that ,

(20)

Next, consider the geometric series and notice that for , it holds that

Then, it follows that, for each , there exists at least a such that . Hence, we are ready to derive the upper bound we were looking for:

Moreover, using the above results, we can derive an upper bound on :

Therefore, if we define , then

At this point, we find the value of such that our assumption holds. For that, note that it is sufficient to prove that . Hence, it is sufficient to prove that

To this end, we start by rewriting the above equation in terms of :

(21)

In the above, it is easy to check that . Moreover, if we fix , then the left hand side is minimized whenever . Then, it is sufficient to prove that