We study the problem of ranking with submodular valuations. An instance of this problem consists of a ground set , and a collection of monotone submodular set functions , where each . An additional ingredient of the input is a weight vector . The objective is to find a linear ordering of the ground set elements that minimizes the weighted cover time of the functions. The cover time of a function is the minimal number of elements in the prefix of the linear ordering that form a set whose corresponding function value is greater than a unit threshold value.
Our main contribution is an approximation algorithm for the problem, where is the smallest nonzero marginal value that any function may gain from some element. Our algorithm orders the elements using an adaptive residual updates scheme, which may be of independent interest. We also prove that the problem is hard to approximate, unless . This implies that the outcome of our algorithm is optimal up to constant factors.
1 Introduction
Let be a set function, where . The function is submodular iff
for all . An alternative definition of submodularity is through the property of decreasing marginal values. Given a function and a set , the function is defined by . The value is called the incremental marginal value of element to the set . The decreasing marginal values property requires that is nonincreasing function of for every fixed . Formally, it requires that , for all and . Since the amount of information necessary to convey an arbitrary submodular function may be exponential, we assume a value oracle access to the function. A value oracle for allows us to query about the value of for any set . Throughout the rest of the paper, whenever we refer to submodular functions, we shall also imply normalized and monotone functions. Specifically, we assume that a submodular function also satisfies and whenever .
In this paper, we focus our attention on the problem of ranking with submodular valuations. An instance of this problem consists of a ground set , and a collection of monotone submodular set functions , where each . An additional ingredient of the input is a weight vector . The objective is to find a linear ordering of the ground set elements that minimizes the weighted cover time of the functions. The cover time of a function is the minimal number of elements in the prefix of the linear ordering that form a set whose corresponding function value is greater than some predetermined threshold. More precisely, the objective is to find a linear ordering that minimizes , where is the cover time of function , defined as the minimal index for which . Here, stands for the element scheduled at time according to the linear ordering . It is worth noting that the fact that each cover time is defined with respect to a unit threshold value does not limit the generality of the problem. In particular, given a vector , where determines the cover threshold of function , one can obtain an equivalent instance by normalizing each with , and updating all thresholds to .
1.1 Our results
Our main contribution is an approximation algorithm for ranking with submodular valuations, where is the smallest nonzero marginal value that any function may gain from some element. We note that elements can have a marginal value of zero. Our algorithm orders the ground set elements using an adaptive residual updates scheme, which iteratively selects an element that has a maximal marginal contribution with respect to an appropriately defined residual cover of the functions. This approach has similarities with the wellknown multiplicative weights method (see, e.g., [29, 17]). Our algorithm is motivated by the observation that the natural greedy algorithm, which iteratively selects an element based on its absolute marginal contribution to the cover of the functions, performs poorly. In particular, a greedy type algorithm misjudges elements with low marginal contribution as less important, and therefore, unwisely schedules them late.
We also establish that ranking with submodular valuations is hard to approximate, assuming that . This implies that the outcome of our algorithm is optimal up to constant factors. This result is attained by demonstrating that the restricted setting of our problem in which there is a single function to cover already incorporates the set cover problem as a special instance. We would like to emphasize that even though this single function setting captures the computational hardness of the problem, it does not capture its algorithmic essence. The main algorithmic challenge that is addressed by our scheme is to obtain a good performance guarantee when there are many functions, each of which has a different linear order that best suits its needs. In particular, one can easily validate that the natural greedy algorithm is essentially optimal in the single function setting. One additional interesting consequence of this result is that our problem generalizes both the set cover problem and its minsum variant. This is the first problem formulation that has this property.
1.2 Applications
Web search ranking. One impetus for studying the above problem is an application in web search ranking. Web search has become an important part in the daily lives of many people. Recently, there has been a great interest in incorporating users behavior into web search ranking. Essentially, these studies make an effort to personalize the web search results (see, e.g., [33, 1, 13, 14]). However, in the absence of any explicit knowledge of user intent, one has to focus on how to produce a set of diversified results that properly account for the interests of the overall user population [12, 2]. In particular, it seems natural to utilize logs of previous search sessions, and try to minimize the average effort of all users in finding the web pages that satisfy their needs. When performing web search, a user usually reads the result items from top to bottom [27]. The time a user spends on reading the result items is the overhead in web search.
The problem of ranking with submodular valuations can model the abovementioned scenario as follows: there is a set of search result items and there are user types of known proportion. Each user type has a submodular relevance function that quantifies the information that the user type gains from inspecting any subset of result items. The goal is to order the result items in a way that minimizes the average effort of the user types. The effort of a user type is the number of result items it has to review until it gains a critical mass of relevant information. Notice that submodularity suits naturally for the ranking application since information in result items overlaps and does not necessarily complement each other.
Broadcast in mobile networks. Another application that can be modeled by ranking with submodular valuations is broadcast in mobile networks. In this scenario, there is a base station that needs to sequentially transmit a set of data segments. In addition, there is a collection of clients, each of which is interested in some individual target data. Each data segment contains a mix of information that may be relevant to a number of clients. The amount of information depends both on the data segment and the client. Moreover, there is informational redundancy between different data segments. This allows clients to extract their relevant target data from different subsets of segments. A client can extract her target data once she receives sufficient relevant information from the data segments. The goal is to set an order for the transmission of the data segments that minimizes the average latency of the clients. The latency of a client is the earliest time in which she receives data segments that contain enough information to decode her target data. Notice that the amount of relevant information that each client extracts from the data segments is submodular.
1.3 Previous work on special cases
The problem of ranking with submodular valuations extends the multiple intents ranking problem [6]. One can demonstrate that an input instance for the latter problem can be translated to an instance of ranking with submodular valuations in which each function is linear, and the value that the function has for any element is either or some value common to that function. The multiple intents ranking problem is known to admit a constant approximation by the work of Bansal, Gupta and Krishnaswamy [7]. Specifically, they presented a clever randomized LP rounding algorithm that improved upon a previous logarithmic approximation [6]. The minsum set cover problem can be modelled as a special case of multiple intents ranking in which each is boolean, i.e., . The best known result for this problem is a approximation algorithm that was developed by Feige, Lovász and Tetali [16]. This algorithm was implicit in the work of BayNoy et al. [9]. The former paper also proved that approximation is best possible, unless . The minimum latency set cover problem is another special case of multiple intents ranking in which each function has exactly elements with nonnegative value . Hassin and Levin [21] studied this problem, and observed that it can be modeled as a special case of the classic precedenceconstrained scheduling problem . The latter problem has various approximation algorithms (see, e.g., the survey [11]). Woeginger [34] demonstrated that the special case derived from minimum latency set cover is as hard to approximate as the general scheduling problem. This implies, in conjunction with a recent work of Bansal and Khot [8], that it is hard to approximate minimum latency set cover to within a factor better than , assuming a variant of the Unique Games Conjuncture.
1.4 Other related work
Submodular functions arise naturally in operations research and combinatorial optimization. One of the most extensively studied questions is how to minimize a submodular function. A series of results demonstrate that this task can be performed efficiently, either by the ellipsoid algorithm [20] or through strongly polynomial time combinatorial algorithms [31, 24, 22, 28, 23, 26]. Recently, there has been a surge of interest in understanding the limits of tractability of minimization problems in which the classic linear objective function was replaced by a submodular one (see, e.g., [32, 19, 18, 25]). Notably, these submodular problems are commonly considerably harder to approximate than their linear counterparts. For example, the minimum spanning tree problem, which is polynomial time solvable with linear cost functions is hard to approximate with submodular cost functions [18], and the sparsest cut problem, which admits an approximation algorithm when the cost is linear [3] becomes hard to approximate with submodular costs [32]. Our work extends the tools and techniques in this line of research. In particular, our results establish a computational separation of logarithmic order between the submodular settings and the linear setting, which admits a constant factor approximation [5].
2 An Adaptive Residual Updates Scheme
In this section, we develop a deterministic algorithm for the problem under consideration that has an approximation guarantee of , where is the smallest nonzero marginal value that any function may gain from some element. An interesting feature of our algorithm is that it generalizes several previous algorithmic results. For example, if our algorithm is given a multiple intents ranking instance then it behaves like the harmonic interpolation algorithm of [6], if it is given a minsum set cover instance then it reduces to the constant approximation algorithm of [16], and if it is given a (submodular) set cover instance then it acts like the wellknown greedy algorithm for the set cover problem [35]. Nonetheless, it is important to emphasize that all these algorithms use fixed values in their computation; in contrast, our algorithm employs dynamically changing values.
2.1 The algorithm
The adaptive residual updates algorithm, formally described below, works in steps. In each step, the algorithm extends the linear ordering with a nonselected element that maximizes the weighted sum of its corresponding potential values. The potential value of element for function is initially equal to the marginal value . As the algorithm progresses, it is adaptively updated with respect to the selected elements and the residual cover of , as formally presented in line 8. Intuitively, this update fashion gives more influence to values corresponding to functions whose cover draw near their thresholds. We emphasize that this dynamic update fashion is different than that of the exponential weights and harmonic interpolation techniques. Also note that our adaptive residual updates scheme is motivated by the observation that the natural greedy algorithm, which orders elements based on their absolute marginal contribution, fails to provide good approximation. This insufficiency is exhibited in Appendix A.1.
2.2 Analysis
In the remainder of this section, we analyze the performance of the algorithm. There are several techniques that we employ. We begin by establishing an interesting algebraic inequality applicable for any monotone function and any arbitrary sequence of element additions. For the purpose of bounding the cost of the solution of our algorithm, we compare it to a collection of solutions induced by the optimal linear ordering applied to truncated instances of the problem. We also utilize and extend the analysis methods presented in [16, 6].
Theorem 2.1.
The adaptive residual updates algorithm constructs a linear ordering whose induced cost is no more than times the optimal one.
Proof.
We begin by introducing the notation and terminology to be used throughout this proof:

Let and be the cost induced by the optimal linear ordering and the linear ordering constructed by the algorithm, respectively.

Let be the final state of the potential values matrix maintained during the algorithm. Notice that the potential values of each column of are the values that the corresponding element had when it was selected by the algorithm.

Let be the indices of the functions that were not covered before step of the algorithm, and let be the relative cost of step of the algorithm. Specifically, . In addition, let be the weighted sum of potential values corresponding to the element selected at step of the algorithm, and . Finally, let be the penalty of step .
We can now reinterpret the cost induced by the linear ordering using the mentioned notation. In particular, one can validate that . The next lemma bounds in terms of .
Lemma 2.2.
Let . Then, , for every .
Proof.
In what follows, we demonstrate that holds for every function . Notice that if we establish this argument then the lemma follows since
where the second equality results by noticing that each function covered before step must have for every . As a result, the only functions that may have strictly positive potential values are those that were not covered before step , namely, those in .
Consider the function , and let us assume that its cover time is . Let be the set of elements ordered up to (and including) step according to . In particular, let . One can verify that the potential values of function satisfy for every , , and for every . Consequently, we get that
where the inequality follows from Claim 2.3. This claim establishes a generic bound which applies to any monotone function and any arbitrary sequence of element additions. The desired bound is obtained by utilizing the claim with respect to the submodular function and the collection of sets . One should also notice that by construction, and that . ∎
Claim 2.3.
Given a monotone function , and a collection of set such that and then
where .
Proof.
The monotonicity property of the function guarantees that . We can also assume without loss of generality that , since otherwise, the last term in the abovementioned summation must be equal to , and therefore, may be neglected. Now, notice that for any ,
where the first inequality results from the fact that the function is monotonically increasing for . Furthermore, notice that . This simply follows since we know that . Combining the previously stated arguments, we attain that
where the last inequality holds since . ∎
We continue by introducing a collection of histograms. These histograms will be utilized to bound the cost of the algorithm in terms of the cost of the optimal solution.
The optimal solution as a histogram. The histogram that relates to the optimal solution consists of bars, one for each function. The bar associated with function has a width of , while its height is equal to the step in which that function was covered in the optimal solution. A function is regarded as covered at step if its cover time according to the optimal ordering is . The bars in the histogram are ordered according to the steps in which the corresponding functions were covered in the optimal solution. Notice that this implies that the histogram is nondecreasing. Furthermore, notice that the total width of the histogram is , and the overall area beneath it is .
A collection of truncated solutions as histograms. We now define a collection of histograms, each of which corresponds to a solution of a truncated input instance with respect to some step of the algorithm. Informally, a truncated instance is a relaxation of the input instance that admits better solutions than the optimal solution. As a result, the collection of histograms establishes a connection between the optimal solution and the solution of the algorithm. For the purpose of defining the truncated input instance corresponding to step , let be the set of elements selected by the algorithm before step . The truncated instance is obtained by incrementally applying the following two modification steps to the underlying input instance:
(i) a set of elements is given for free. We modify the instance by giving all the elements in for free. The impact of this modification is twofold: first, all the functions that were covered by the algorithm up to that step cannot incur any cost, and basically, they can be removed from the modified instance; second, the threshold of each function that was not covered by the algorithm up to that step decreases by , that is, its threshold in the modified instance becomes . Now, notice that in order to translate this instance to the canonical form in which all thresholds are equal to , one has to normalize each with the corresponding threshold . It is important to note that the marginal values of each function at step of the algorithm are normalized exactly by this term (to obtain the corresponding potential values).
(ii) the cost of each function is relaxed. It is implicit in our problem definition that each function has a matching cost function , which represents the cost that collects per step with respect to its cover. Specifically, letting indicate the “extent of cover” of function , its cost is a simple stepfunction, defined as follows:
Namely, the function collects a cost of in each step until it is covered. We modify the cost function of each to be continuously decreasing with constant derivative. In particular, the updated cost function of becomes . Notice that this modification implies that even a partial cover of a function decreases its cost per step. The interpretation one should have in mind is that represents the fraction of covered weight, and once some fraction of weight is covered it stops incurring cost.
The crucial observation one should make regarding the resulting truncated instance is that the linear ordering constructed by the optimal algorithm induces a solution for this instance whose matching histogram is nondecreasing and completely contained within the optimal solution histogram. The latter argument is formally presented and proved in the following lemma.
Lemma 2.4.
The optimal linear ordering constructed with respect to the original instance induces a solution for the truncated instance whose matching histogram is completely contained within the optimal solution histogram when aligned to its lower right boundary.
Proof.
Prior to proving this lemma, it is important to note that the histogram built with respect to the induced solution is defined in a slightly different way than the optimal solution histogram. The difference results from the fact that the cost per step of each function decreases as it is covered. Specifically, this truncated solution histogram has the same interpretation of the axes as the optimal solution histogram, and its bars are ordered according to nondecreasing heights. However, the number of bars depends on the underlying solution. For instance, suppose that function was incrementally covered using positive portions such that in steps . This function gives rise to bars in the histogram, where bar has a width of and a height of . Indeed, the total area beneath these bars is the relative cost of function since the cost per step of that function is between steps and , between steps and , and so on. Accordingly, one can validate that
Now, for the purpose of establishing the lemma, we utilize the following simple claim, whose proof appears in Appendix A.2. The claim presents a transformation that may be applied to nondecreasing histograms and does not increase their upper boundary.
Claim 2.5.
Consider a nondecreasing histogram and suppose we modify it by decreasing the height of some part of a bar and then updating its axis position to maintain the nondecreasing property. The resulting histogram is completely contained within the primary histogram.
We demonstrate that the modifications used to generate the truncated instance can be translated into a sequence of the abovementioned transformation that generates the truncated solution histogram from the optimal solution histogram:
(i) an element is given for free. Suppose that the element given for free was scheduled at step of the optimal linear ordering. Notice that all the elements scheduled at steps according to the optimal linear ordering are scheduled at step in the induced solution. This follows as the element under consideration does not appear in the induced linear ordering for the truncated instance. This implies that the cover time of all the functions that are critical with respect to these elements decreases by . We say that an element is critical for a function if that function topped its threshold after element was selected. Accordingly, the height of the corresponding bars in the histogram decreases by . This translates to a sequence of the mentioned transformation. Furthermore, notice that all the functions that were covered up to step according to the optimal linear ordering may be covered in prior steps in the induced solution. This is due to the “free partial cover” that the element induces. In particular, the functions that were covered at step of the optimal linear ordering must be covered in prior steps in the induced solution. The cover time of each of these function may vary depending on the extent of their cover with respect to the element under consideration and previously scheduled elements. Still, it is clear that this cover time must be strictly smaller than in the optimal linear ordering. Consequently, the height of the bars associated with these functions decreases. Again, this translates to a sequence of the mentioned transformation. Figure 1 provides an illustration of this modification.
(ii) The cost of a function is relaxed. Suppose the cost of some function was relaxed, and let us assume that was incrementally covered using positive portions such that in steps of the optimal linear ordering. Notice that as a result of the relaxation, the histogram should consist of bars instead of the single bar corresponding to function . In particular, each bar should have a width of and a height of . This can be interpreted as replacing a single bar having respective width and height of and with bars whose total width is and each has a height of at most . It is easy to verify that this may translate to a sequence of the mentioned transformation. ∎
The solution of the algorithm as a histogram. The histogram that corresponds to the solution generated by the algorithm consists of bars, one for each entry of the potential values matrix. The width of each bar corresponding to entry , covered at step of the algorithm, is its weighted potential value , while its height is the penalty of the corresponding step . Note that an entry is regarded as covered at step of the algorithm if . The bars are ordered according to the steps in which the corresponding entries were covered by the algorithm. Notice that the ordering of an element at step gives rise to bars in the histogram whose total width is . Moreover, note that the total width of the histogram is , which is at least as large as (and maybe much larger than) the total width of the optimal histogram, and that the area beneath the histogram is , which is precisely , as previously noted.
Having all the histograms definitions in place, we are now ready to prove the theorem. We claim that the area beneath the histogram corresponding to the solution of the algorithm is at most times larger than the area beneath the histogram of the optimal solution, where . Let us consider the transformation that shrinks the width and height of each bar of the algorithm’s histogram by a factor of and , respectively. Specifically, after applying the transformation, the bar corresponding to entry covered at step has a width of and a height of . We next argue that this shrunk histogram is completely contained within the optimal solution histogram when aligned with its lower right boundary. Notice that this implies that the area beneath the shrunk histogram is no more than the area beneath the optimal solution histogram, implying that , and therefore, proving the theorem.
For the purpose of establishing this containment argument, let us focus on an arbitrary point in the histogram of the algorithm. We assume without loss of generality that it lies in the bar corresponding to entry covered during step of the algorithm. This implies that the height of is at most , and its distance from the right side boundary is no more than . Let us consider the point , which is the mapping of in the shrunk histogram. Note that the height of is at most , while its distance from the right boundary is at most . In the following, we prove that lies within the truncated solution histogram corresponding to step . This is achieved by demonstrating that the linear ordering induced by the optimal solution for the truncated instance has at least weight to cover by time step . In fact, we establish a more powerful property that states that any ordering of elements of the truncated instance has at least weight to cover by time step . Now, recall that Lemma 2.4 guarantees that the histogram of the truncated solution is completely contained within the optimal solution histogram, and hence, we obtain that must also lie within the histogram of the optimal solution.
Let us concentrate on the set of elements in the truncated instance corresponding to step . We argue that any element selected in any step of any linear ordering cannot reduce the weight by more than . This argument results from the construction of the truncated instance, the submodularity of the functions, and the greedy selection rule of the algorithm. Specifically, the construction of the truncated instance guarantees that the (initial) marginal value of each element for each function is equal to the potential value at step of the algorithm. This relates to the normalization after the set of elements was given for free. Also note that can be interpreted as the fraction of the weight that may be covered when selecting element . This corresponds to the modification of the cost function of each function in the truncated instance to be continuously decreasing with constant derivative. The submodularity of the functions, which involve decreasing marginal values, ensures that the marginal value of each element can only decrease over time, that is, it cannot be greater than in any future step. This implies, in conjunction with the greedy selection rule of the algorithm, which selects the element that maximizes the abovementioned term, that any element selected in any step of any linear ordering cannot cover a weight of more than . Consequently, any linear ordering may cover at most weight by step . Recall that the overall weight of the functions in the truncated instance is exactly , and thus, at least weight is left uncovered. Finally, Lemma 2.2 guarantees that . ∎
3 An Inapproximability Result
In this section, we establish that ranking with submodular valuations is hard to approximate, assuming that . This implies that the outcome of the algorithm from Section 2 is optimal up to constant factors. The essence of the proof is by showing that our problem incorporates the set cover problem as a special instance. In fact, we demonstrate that even the seemingly simple scenario in which there is only one function to cover already generalizes the set cover problem.
Theorem 3.1.
The ranking with submodular valuations problem cannot be approximated within a factor of , for some constant , unless .
Proof.
An instance of a set cover problem consists of the ground set and a collection of sets . The objective is to find a subfamily of of minimum cardinality that covers all elements in . Set cover is known to be NPhard to approximate within a factor of . In other words, there is a constant such that approximating set cover in polynomial time within a factor of implies . This result follows by plugging the proof system of Raz and Safra [30], or alternatively, Arora and Sudan [4] into a reduction of Bellare et al. [10] (see also the result of Feige [15], which shows inapproximability under a slightly stronger assumption).
Given a set cover instance, we define an instance of ranking with submodular valuations as follows. There are elements, each corresponds to a set in . Furthermore, there is one submodular set function whose corresponding weight is , namely, . The submodular function is defined as
Notice that the resulting instance is a valid ranking with submodular valuations instance as the function is normalized, monotone, and submodular. For example, satisfies the decreasing marginal values property since
is nonincreasing function of for every fixed .
It is easy to see that a set cover in the original instance can be converted to a linear ordering of the elements in the newlycreated instance of identical cost. Specifically, one should order the elements that correspond to the sets of the set cover first (in some arbitrary way), and then order the remaining elements. Conversely, it is not difficult to verify that given a linear ordering of the elements, we can perform a similar costpreserving transformation in the opposite direction. This implies that unless , it is impossible to approximate the ranking with submodular valuations problem to within a factor of , where the inequality holds as is the smallest nonzero marginal value which is clearly at least . ∎
4 A Concluding Remark
Incorporating cost into the problem. As previously noted, it is implicit in the definition of the problem that each function that needs to be covered has a matching stepfunction representing the cost that the function collects per step with respect to its cover. Specifically, the stepfunction corresponding to function is determined by its step height . It is only natural to consider the generalization in which each has an arbitrary nonincreasing cost function instead of the height parameter . One can demonstrate that our techniques can be utilized to solve this variant. The main idea is to reduce the nonincreasing cost function case to the stepfunction case. This can be done by carefully approximating each nonincreasing cost function by a collection of stepfunctions, as schematically described in Figure 2. The full version of the paper will provide a detailed description of this result.
Acknowledgments:
The authors would like to thank Oded Regev for useful discussions on topics related to this paper.
References
 [1] E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 19–26, 2006.
 [2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings 2nd International Conference on Web Search and Web Data Mining, pages 5–14, 2009.
 [3] S. Arora, E. Hazan, and S. Kale. approximation to sparsest cut in time. SIAM J. Comput., 39(5):1748–1771, 2010.
 [4] S. Arora and M. Sudan. Improved lowdegree testing and its applications. Combinatorica, 23(3):365–426, 2003.
 [5] Y. Azar and I. Gamzu. Ranking with unrelated valuations. 2010. Manuscript.
 [6] Y. Azar, I. Gamzu, and X. Yin. Multiple intents reranking. In Proceedings 41st Annual ACM Symposium on Theory of Computing, pages 669–678, 2009.
 [7] N. Bansal, A. Gupta, and R. Krishnaswamy. A constant factor approximation algorithm for generalized minsum set cover. In Proceedings 21st Annual ACMSIAM Symposium on Discrete Algorithms, pages 1539–1545, 2010.
 [8] N. Bansal and S. Khot. Optimal long code test with one free bit. In Proceedings 50th Annual IEEE Symposium on Foundations of Computer Science, pages 453–462, 2009.
 [9] A. BarNoy, M. Bellare, M. M. Halldórsson, H. Shachnai, and T. Tamir. On chromatic sums and distributed resource allocation. Inf. Comput., 140(2):183–202, 1998.
 [10] M. Bellare, S. Goldwasser, C. Lund, and A. Russell. Efficient probabilistically checkable proofs and applications to approximations. In Proceedings 25th Annual ACM Symposium on Theory of Computing, pages 294–304, 1993.
 [11] C. Chekuri and S. Khanna. Approximation algorithms for minimizing average weighted completion time. In Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, 2004.
 [12] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 659–666, 2008.
 [13] Z. Dou, R. Song, and J.R. Wen. A largescale evaluation and analysis of personalized search strategies. In Proceedings 16th International Conference on World Wide Web, pages 581–590, 2007.
 [14] G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In Proceedings 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 331–338, 2008.
 [15] U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
 [16] U. Feige, L. Lovász, and P. Tetali. Approximating min sum set cover. Algorithmica, 40(4):219–234, 2004.
 [17] N. Garg and J. Könemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. SIAM J. Comput., 37(2):630–652, 2007.
 [18] G. Goel, C. Karande, P. Tripathi, and L. Wang. Approximability of combinatorial problems with multiagent submodular cost functions. In Proceedings 50th Annual IEEE Symposium on Foundations of Computer Science, pages 755–764, 2009.
 [19] M. X. Goemans, N. J. A. Harvey, S. Iwata, and V. S. Mirrokni. Approximating submodular functions everywhere. In Proceedings 20th Annual ACMSIAM Symposium on Discrete Algorithms, pages 535–544, 2009.
 [20] M. Grötschel, L. Lovász, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.
 [21] R. Hassin and A. Levin. An approximation algorithm for the minimum latency set cover problem. In Proceedings 13th Annual European Symposium on Algorithms, pages 726–733, 2005.
 [22] S. Iwata. A faster scaling algorithm for minimizing submodular functions. SIAM J. Comput., 32(4):833–840, 2003.
 [23] S. Iwata. Submodular function minimization. Math. Program., 112(1):45–64, 2008.
 [24] S. Iwata, L. Fleischer, and S. Fujishige. A combinatorial strongly polynomial algorithm for minimizing submodular functions. J. ACM, 48(4):761–777, 2001.
 [25] S. Iwata and K. Nagano. Submodular function minimization under covering constraints. In Proceedings 50th Annual IEEE Symposium on Foundations of Computer Science, pages 671–680, 2009.
 [26] S. Iwata and J. B. Orlin. A simple combinatorial algorithm for submodular function minimization. In Proceedings 20th Annual ACMSIAM Symposium on Discrete Algorithms, pages 1230–1237, 2009.
 [27] T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst., 25(2), 2007.
 [28] J. B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. In Proceedings 12th International Conference on Integer Programming and Combinatorial Optimization, pages 240–251, 2007.
 [29] S. A. Plotkin, D. B. Shmoys, and Éva Tardos. Fast approximation algorithms for fractional packing and covering problems. Math. Operations Research, 20:257–301, 1995.
 [30] R. Raz and S. Safra. A subconstant errorprobability lowdegree test, and a subconstant errorprobability PCP characterization of NP. In Proceedings 29th Annual ACM Symposium on Theory of Computing, pages 475–484, 1997.
 [31] A. Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomial time. J. Comb. Theory, Ser. B, 80(2):346–355, 2000.
 [32] Z. Svitkina and L. Fleischer. Submodular approximation: Samplingbased algorithms and lower bounds. In Proceedings 49th Annual IEEE Symposium on Foundations of Computer Science, pages 697–706, 2008.
 [33] J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis of interests and activities. In Proceedings 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 449–456, 2005.
 [34] G. J. Woeginger. On the approximability of average completion time scheduling under precedence constraints. Discrete Applied Mathematics, 131(1):237–252, 2003.
 [35] L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.
Appendix A Additional Details
In this section, we present details omitted from the main part of the paper.
a.1 The natural greedy algorithm is insufficient
Let us consider the greedy algorithm that is formally described below. This algorithm is built from a sequence of greedy steps that set up the linear ordering. In each step, the algorithm selects a nonselected element that has a maximal marginal contribution to the functions. The contribution of element to each function is the gain that provides towards the threshold of , as formally exhibited in line 8.
Unfortunately, as the following theorem demonstrates, the greedy algorithm fails to provide good approximation. The shortfall of the algorithm is that it misjudges elements with low marginal contribution as less important, and thus, schedules them late. As one may expect, this turns to be crucial when many functions depend on a single element which has a low marginal contribution.
Theorem A.1.
The cumulative greedy algorithm has an approximation ratio of .
Proof.
We consider an input instance that consists of elements, functions, and a weight vector that all its entries are identical. All the functions in the input instance are assumed to be linear. A function is called linear if there is a valuations vector such that . Accordingly, we represent the functions using the following matrix.
In particular, the th row represents the values that function has for the elements. Note that the size of ’s upperleft nonzero submatrix is , while the size of its lowerright identity submatrix is . Let us analyze the performance of the greedy algorithm on this instance. Notice that in each step, the algorithm extends the linear ordering with a nonselected element (column) whose sum of entries is maximal. This follows since all weights are identical, and the sum of entries of each row of is exactly . Consequently, the algorithm initially orders element , then elements to , and finally, element . The cost of the algorithm is . On the other hand, ordering the elements according to their column number induces a linear ordering whose cost is . ∎
a.2 Proof of Claim 2.5
Consider an arbitrary point initially positioned at coordinate in the histogram. Notice that unless this point is in the area removed from the histogram due to the height decrease, it is transformed to a new position as result of the axis position update described in the claim. We argue that this new position is contained in the primary histogram. If was initially positioned in the bar whose height was decreased then the argument clearly holds since the corresponding bar is shifted while maintaining the nondecreasing property. Otherwise, the new position must satisfy and . However, this implies that it must be contained in the primary histogram since was initially in the histogram and this histogram is nondecreasing. Figure 3 provides a schematic description of the transformation.