Scenario Submodular Cover

# Scenario Submodular Cover

Nathaniel Grammel ngrammel@nyu.edu
Department of Computer Science and Engineering
NYU Tandon School of Engineering
Brooklyn, NY 11201 \ANDLisa Hellerstein111Partially Supported by NSF Grant 1217968
lisa.hellerstein@nyu.edu
Department of Computer Science and Engineering
NYU Tandon School of Engineering
Brooklyn, NY 11201 \ANDDevorah Kletenik111Partially Supported by NSF Grant 1217968
kletenik@sci.brooklyn.cuny.edu
Department of Computer and Information Science
Brooklyn College, City University of New York
2900 Bedford Avenue
Brooklyn, NY 11210 \ANDPatrick Lin111Partially Supported by NSF Grant 1217968
plin15@illinois.edu
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL
Partially Supported by NSF Grant 1217968
###### Abstract

Many problems in Machine Learning can be modeled as submodular optimization problems. Recent work has focused on stochastic or adaptive versions of these problems. We consider the Scenario Submodular Cover problem, which is a counterpart to the Stochastic Submodular Cover problem studied by Golovin and Krause (2011). In Scenario Submodular Cover, the goal is to produce a cover with minimum expected cost, where the expectation is with respect to an empirical joint distribution, given as input by a weighted sample of realizations. In contrast, in Stochastic Submodular Cover, the variables of the input distribution are assumed to be independent, and the distribution of each variable is given as input. Building on algorithms developed by Cicalese et al. (2014) and Golovin and Krause (2011) for related problems, we give two approximation algorithms for Scenario Submodular Cover over discrete distributions. The first achieves an approximation factor of , where is the size of the sample and is the goal utility. The second, simpler algorithm achieves an approximation bound of , where is the goal utility and is the sum of the integer weights. (Both bounds assume an integer-valued utility function.) Our results yield approximation bounds for other problems involving non-independent distributions that are explicitly specified by their support.

Scenario Submodular Cover Nathaniel Grammel**footnotemark: * ngrammel@nyu.edu
Department of Computer Science and Engineering
NYU Tandon School of Engineering
Brooklyn, NY 11201
Lisa Hellerstein111Partially Supported by NSF Grant 1217968 lisa.hellerstein@nyu.edu
Department of Computer Science and Engineering
NYU Tandon School of Engineering
Brooklyn, NY 11201
Devorah Kletenik111Partially Supported by NSF Grant 1217968 kletenik@sci.brooklyn.cuny.edu
Department of Computer and Information Science
Brooklyn College, City University of New York
2900 Bedford Avenue
Brooklyn, NY 11210
Patrick Lin111Partially Supported by NSF Grant 1217968 plin15@illinois.edu
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL

## 1 Introduction

Many problems in Machine Learning can be modeled as submodular optimization problems. Recent work has focused on stochastic or adaptive versions of submodular optimization problems, which reflect the need to make sequential decisions when outcomes are uncertain.

The Submodular Cover problem generalizes the classical NP-complete Set Cover problem and is a fundamental problem in submodular optimization. Adaptive versions of this problem have applications to a variety of machine learning problems that require building a decision tree, where the goal is to minimize expected cost. Examples include problems of entity identification (exact learning with membership queries), classification (equivalence class determination), and decision region identification (cf. Golovin and Krause (2011); Golovin et al. (2010); Bellala et al. (2012); Javdani et al. (2014)). Other applications include reducing prediction costs for learned Boolean classifiers, when there are costs for determining attribute values (Deshpande et al. (2014)).

Previous work on the Stochastic Submodular Cover problem assumes that the variables of the input probability distribution are independent. Optimization is performed with respect to this distribution. We consider a new version of the problem that we call Scenario Submodular Cover, that removes the independence assumption. In this problem, optimization is performed with respect to an input distribution that is given explicitly by its support (with associated probability weights). We give approximation algorithms solving the Scenario Submodular Cover problem over discrete distributions.

Before describing our contributions in more detail, we give some background. In generic terms, an adaptive submodular cover problem is a sequential decision problem where we must choose items one by one from an item set . Each item has an initially unknown state, which is a member of a finite state set . The state of an item is revealed only after we have chosen the item. We represent a subset of items and their states by a vector where if , and is the state of item otherwise. We are given a monotone, submodular utility function . It assigns a non-negative integer value to each subset of the items and the value can depend on the states of the items.111The definitions of the terms “monotone” and “submodular,” for state-dependent utility functions, has not been standardized. We define these terms in Section 2. In the terminology used by Golovin and Krause Golovin and Krause (2011), is pointwise monotone and pointwise submodular. There is a non-negative goal utility value , such that for all . There is a cost associated with choosing each item, which we are given. In distributional settings, we are also given the joint distribution of the item states. We must continue choosing items until their utility value is equal to the goal utility, . The problem is to determine the adaptive order in which to choose the items so as to minimize expected cost (in distributional settings) or worst-case cost (in adversarial settings).

Stochastic Submodular Cover is an adaptive submodular cover problem, in a distributional setting. In this problem, the state of each item is a random variable, and these variables are assumed to be independent. The distributions of the variables are given as input. Golovin and Krause introduced a simple greedy algorithm for this problem, called Adaptive Greedy, that achieves an approximation factor of . A dual greedy algorithm for the problem, called Adaptive Dual Greedy, was presented and analyzed by Deshpande et al. (2014). These greedy algorithms have been useful in solving other stochastic optimization problems, which can be reduced to Stochastic Submodular Cover through the construction of appropriate utility functions (e.g.,  Javdani et al. (2014); Chen et al. (2015a); Deshpande et al. (2014); Golovin et al. (2010)).

The problem we study in this paper, Scenario Submodular Cover (Scenario SC), is also a distributional, adaptive submodular cover problem. The distribution is given by a weighted sample, which is provided as part of the input to the problem. Each element of the sample is a vector in , representing an assignment of states to the items in . Associated with each assignment is a positive integer weight. The sample and its weights define a joint distribution on , where the probability of a vector in the sample is proportional to its weight. (The probability of a vector in that is not in the sample is 0.) As in Stochastic Submodular Cover, the problem is to choose the items and achieve utility , in a way that minimizes the expected cost incurred. However, because many of the proofs of results for the Stochastic Submodular Cover problem rely on the independence assumption, the proofs do not apply to the Scenario SC problem.

#### Results

We present an approximation algorithm for the Scenario SC problem that we call Mixed Greedy. It uses two different greedy criteria. It is a generalization of an algorithm by Cicalese et al. (2014) for the Equivalence Class Determination problem (which has also been called the Group Identification problem and the Discrete Function Evaluation problem).

The approximation factor achieved by Mixed Greedy for the Scenario SC problem is , where is a quantity that depends on the utility function . In the case of the utility function constructed for the Equivalence Class Determination Problem, is constant, but this is not true in general.

We describe a modified version of Mixed Greedy that we call Scenario Mixed Greedy. It works by first constructing a new monotone, submodular utility function from and the sample, for which is constant. It then runs Mixed Greedy on with goal value , where is the size of the sample. We show that Scenario Mixed Greedy achieves an approximation factor for any Scenario SC problem.

Mixed Greedy is very similar to the algorithm of Cicalese et al., and we use the same basic analysis. However, at the heart of their analysis is a technical lemma with a lengthy proof bounding a quantity that they call the “sepcost”. The proof applies only to the particular utility function used in the Equivalence Class Determination problem. We replace this proof with an entirely different proof that applies to the general Scenario SC problem. Our proof is based on the work of Streeter and Golovin (2009) for the Min-Sum Submodular Cover problem.

In addition to presenting and analyzing Mixed Greedy, we also present another algorithm for the Scenario SC problem that we call Scenario Adaptive Greedy. It is a modified version of the Adaptive Greedy algorithm of Golovin and Krause. Scenario Adaptive Greedy is simpler and more efficient than Mixed Greedy, and is therefore likely to be more useful in practice. However, the approximation bound proved by Golovin and Krause for Adaptive Greedy depends on the assumption that and the distribution defined by the sample weights jointly satisfy the adaptive submodularity property. This is not the case for general instances of the Scenario SC problem. We extend the approach used in constructing to give a simple, generic method for constructing a modified utility function , with goal utility , from , which incorporates the weights on the sample. We prove that utility function and the distribution defined by the sample weights jointly satisfy adaptive submodularity. This allows us to apply the Adaptive Greedy algorithm, and to achieve an approximation bound of for the Scenario SC problem, where is the sum of the weights.

Our constructions of and are similar to constructions used in previous work on Equivalence Class Determination and related problems (cf. Golovin et al. (2010); Bellala et al. (2012); Chen et al. (2015a, b)). Our proof of adaptive submodularity uses the same basic approach as used in previous work (see, e.g., Golovin et al. (2010); Chen et al. (2015a, b)), namely showing that the value of a certain function is non-decreasing along a path between two points; however, we are addressing a more general problem and the details of our proof are different.

We believe that our work on Adaptive Greedy should make it easier to develop efficient approximation algorithms for sample-based problems in the future. Previously, using ordinary Adaptive Greedy to solve a sample-based problem involved the construction of a utility function , and a proof that , together with the distribution on the weighted sample, was adaptive submodular. The proof was usually the most technically difficult part of the work (see, e.g., Golovin et al. (2010); Bellala et al. (2012); Javdani et al. (2014); Chen et al. (2015b)). Our construction of , and our proof of adaptive submodularity, make it possible to achieve an approximation bound using Adaptive Greedy after proving only submodularity of a constructed , rather than adaptive submodularity of and the distribution. Proofs of submodularity are generally easier because they do not involve distributions and expected values. Also, the standard OR construction described in Section 2 preserves submodularity, while it does not preserve Adaptive Submodularity (Chen et al. (2015a)).

Given a monotone, submodular with goal value , we can use the algorithms in this paper to immediately obtain three approximation results for the associated Scenario SC problem: running Mixed Greedy with yields an approximation, running Mixed Greedy with yields an approximation, and running Adaptive Greedy with yields an approximation. By the results of Golovin and Krause (2011), running Adaptive Greedy with yields an approximation for the associated Stochastic SC problem.

#### Applications

Our results on Mixed Greedy yield approximation bounds for other problems. For example, we can easily obtain a new bound for the Decision Region Identification problem studied by Javdani et al. (2014), which is an extension of the Equivalence Class Determination problem. Javdani et al. construct a utility function whose value corresponds to a weighted sum of the hyperedges cut in a certain hypergraph. We can define a corresponding utility function whose value is the number of hyperedges cut. This utility function is clearly monotone and submodular. Using Mixed Greedy with this utility function yields an approximation bound of , where is a parameter associated with the problem, and is the size of the input sample for this problem. In contrast, the bound achieved by Javdani et al. is , where is the minimum weight on a assignment in the sample.

We can apply our greedy algorithms to Scenario BFE (Boolean Function Evaluation) problems, which we introduce here. These problems are a counterpart to the Stochastic BFE problems222In the Operations Research literature, Stochastic Function Evaluation is often called Sequential Testing or Sequential Diagnosis. that have been studied in AI, operations research, and in the context of learning with attribute costs (see e.g., Ünlüyurt (2004); Deshpande et al. (2014); Kaplan et al. (2005)). In a Scenario BFE problem, we are given a Boolean function . For each , we are also given a cost associated with obtaining the value of the th bit of an initially unknown assignment . Finally, we are given a weighted sample . The problem is to compute a (possibly implicit) decision tree computing , such that the expected cost of evaluating on , using the tree, is minimized. The expectation is with respect to the distribution defined by the sample weights.

Deshpande et al. (2014) gave approximation algorithms for some Stochastic BFE problems that work by constructing an appropriate monotone, submodular utility function and running Adaptive Greedy. By substituting the sample-based algorithms in this paper in place of Adaptive Greedy, we obtain approximation results for analogous Scenario BFE problems. For example, using Mixed Greedy, we can show that the Scenario BFE problem for -of- functions has an approximation algorithm achieving a factor of approximation, independent of the size of the sample. Details are in Appendix B. Bounds for other functions follow easily using Scenario Mixed Greedy and Scenario Adaptive Greedy. For example, Deshpande et al. (2014) presented an algorithm achieving an approximation for the Stochastic BFE problem for evaluating decision trees of size . Substituting Scenario Mixed Greedy for Adaptive Greedy in this algorithm yields an approximation for the associated Scenario BFE problem.

We note that our Scenario BFE problem differs from the function evaluation problem by Cicalese et al. (2014). In their problem, the computed decision tree need only compute correctly on assignments that are in the sample, while ours needs to compute correctly on all . To see the difference, consider the problem of evaluating the Boolean OR function, for a sample consisting of only with at least one 1. If the tree only has to be correct on , a one-node decision tree that immediately outputs is valid, even though it does not compute the OR function. Also, in Scenario BFE we assume that the function is given with the sample, and we consider particular types of functions .

#### Organization

We begin with definitions in Section 2. In Section 3, we present the overview of the Mixed Greedy algorithm. Finally, we present Scenario Mixed Greedy in Section 4, followed by Scenario Adaptive Greedy in Section 5.

## 2 Definitions

Let be the set of items and be a finite set of states. A sample is a subset of . A realization of the items is an element , representing an assignment of states to items, where for , represents the state of item . We also refer to an element of as an assignment.

We call a partial realization. Partial realization represents the subset of items where each item has state . For , the quantity denotes the partial realization that is identical to except that . For partial realizations , is an extension of , written , if for all . We use to denote that and .

Let be a utility function. Utility function has goal value if for all realizations .

We define .

A standard utility function is a set function . It is monotone if for all , . It is submodular if in addition, for , . We extend the definitions of monotonicity and submodularity to (state-dependent) utility function as follows:

• is monotone if for , such that , and , we have

• is submodular if for all such that , such that , and , we have .

Let be a probability distribution on . Let be a random variable drawn from . For and , we define . For such that , we define .

• is adaptive submodular with respect to if for all , such that , such that , and , we have .

Intuitively, we can view as partial information about states of items in a random realization , with meaning the state of item is unknown. Then measures the utility of that information, and is the expected increase in utility that would result from discovering the state of .

For with goal value , and and , where , let be the state such that is minimized (if more than one minimizing state exists, choose one arbitrarily). Thus is the state of item that would produce the smallest increase in utility, and thus is “worst-case” in terms of utility gain, if we start from and then discover the state of .

For fixed with goal value , we define an associated quantity , as follows:

 ρ:=minΔg(b,i,γ)Q−g(b)

where the minimization is over , where such that , , , and .

Intuitively, right before the state of an item is discovered, there is a certain distance from the current utility achieved to the goal utility. When the state of that item is discovered, the distance to goal is reduced by some fraction (or possibly by zero). The size of that fraction can vary depending on the state of the item. In the definition of , we are concerned with the value of that fraction, not for the worst-case state in this case (leading to the smallest fraction), but for the next-to-worst case state. The parameter is the smallest possible value for this fraction, starting from any partial realization, and considering any item whose state is about to be discovered.

An instance of the Scenario SC problem is a tuple , where is an integer-valued, monotone submodular utility function with goal value , , assigns a weight to each realization , and is a cost vector. We consider a setting where we select items without repetition from the set of items , and the states of the items correspond to an initially unknown realization . Each time we select an item, the state of the item is revealed. The selection of items can be adaptive, in that the next item chosen can depend on the states of the previous items. We continue to choose items until , where is the partial realization representing the states of the chosen items.

The Scenario SC problem asks for an adaptive order in which to choose the items (i.e., a strategy), until goal value is achieved, such that the expected sum of the costs of the chosen items is minimized. The expectation is with respect to the distribution on that is proportional to the weights on the assignments in the sample: if , and otherwise, where . We call this the sample distribution defined by and and denote it by .

The strategy corresponds to a decision tree. The internal nodes of the tree are labeled with items , and each such node has one child for each state . Each root-leaf path in the tree is associated with a partial realization such that for each consecutive pairs of nodes and on the path, if is the label of , and is the -child of , then . If does not label any node in the path, then . The tree may be output in an implicit form (for example, in terms of a greedy rule), specifyng how to determine the next item to choose, given the previous items chosen and their states. Although realizations do not contribute to the expected cost of the strategy, we require the strategy to achieve goal value on all realizations .

We will make frequent use of a construction that we call the standard OR construction (cf. Guillory and Bilmes (2011); Deshpande et al. (2014)). It is a method for combining two monotone submodular utility functions and defined on , and values and , into a new monotone submodular utility function . For ,

 g(b)=Q1Q2−(Q1−g1(b))(Q2−g2(b))

Suppose that on any , or . Then, for all .

## 3 Mixed Greedy

The Mixed Greedy algorithm is a generalization of the approximation algorithm developed by Cicalese et al. for the Equivalence Class Determination problem. That algorithm effectively solves the Scenario Submodular Cover problem for a particular “Pairs” utility function associated with Equivalence Class Determination. In contrast, Mixed Greedy can be used on any monotone, submodular utility function .

Following Cicalese et al., we present Mixed Greedy as outputting a decision tree. If the strategy is only to be used on one realization, it is not necessary to build the entire tree. While Mixed Greedy is very similar to the algorithm of Cicalese et al, we describe it fully here so that our presentation is self-contained.

### 3.1 Algorithm

The Mixed Greedy algorithm builds a decision tree for Scenario SC instance . The tree is built top-down. It has approximately optimal expected cost, with respect to the sample distribution defined by and . Each internal node of the constructed tree has children, one corresponding to each state . We refer to the child corresponding to as the -child.

The Mixed Greedy algorithm works by calling the recursive function MixedGreedy, whose pseudocode we present in Algorithm 1. In the initial call to MixedGreedy, is set to be equal to . Only the value of changes between the recursive calls; the other values remain fixed. Each call to MixedGreedy constructs a subtree of the full tree for , rooted at a node of that tree. In the recursive call that builds the subtree rooted at , is the partial realization corresponding to the path from the root to in the full tree: if the path includes a node labeled and its -child, and otherwise.

The algorithm of Cicalese et al. for the Equivalence Class Determination problem is essentially the same as our Mixed Greedy algorithm, for equal to their “Pairs” utility function. (There is one small difference – in their algorithm, the first stage ends right before the greedy step in which the budget would be exceeded, whereas we allow the budget to be exceeded in the last step.) Like their algorithm, our Mixed Greedy algorithm relies on a greedy algorithm for the Budgeted Submodular Cover problem due to Wolsey. We describe Wolsey’s algorithm in detail in Appendix A.1.

If , then MixedGreedy returns an (unlabeled) single node, which will be a leaf of the full tree for . Otherwise, MixedGreedy constructs a tree . It does so by computing a special realization called , and then iteratively using to construct a path descending from the root of this subtree, which is called the backbone. It uses recursive calls to build the subtrees “hanging” off the backbone. The backbone has a special property: for each node in the path, the successor node in the path is the child of , where is the item labeling node .

The construction of the backbone is done as follows. Using subroutine FindBudget, MixedGreedy first computes a lower bound on the minimum additional cost required in order to achieve a portion of the goal value , assuming we start with partial realization (Step 6). This computation is done using the Greedy algorithm of Wolsey (1982) described in Section A.1 in the Appendix.

After calculating , MixedGreedy constructs the backbone in two stages, using a different greedy criterion in each to determine which item to place in the current node. In the first stage, corresponding to the first repeat loop of the pseudocode, the goal is to remove weight (probability mass) from the backbone, as cheaply and as soon as possible. That is, consider a realization to be removed from the backbone (or “covered”) if labels a node in the spine and ; removing from the backbone results in the loss of weight from the backbone. The greedy choice used in the first stage in Step 12 follows the standard rule of maximizing bang-for-the-buck; the algorithm chooses such that the amount of probability mass removed from the backbone, divided by the cost , is maximized. However, in making this greedy choice, it only considers items that have cost at most . The first stage ends as soon as the total cost of the items in the chosen sequence is at least . For each item chosen during the stage, is set to .

In the second stage, corresponding to the second repeat loop, the goal is to increase utility as measured by , under the assumption that we already have , and that the state of each remaining item is . The algorithm again uses the bang-for-the-buck rule, choosing the that maximizes the increase in utility, divided by the cost (Step 23). In making this greedy choice, it again considers only items that have cost at most . The stage ends as soon as the total cost of the items in the chosen sequence is at least . For each item chosen during the stage, is set to .

In Section 2, we defined the value . The way the value is chosen guarantees that the updates to during the two greedy stages cause the value of to shrink by at least a fraction before each recursive call. In Appendix A, we prove this fact and use it to prove the following theorem.

###### Theorem 1

Mixed Greedy is an approximation algorithm for the Scenario Adaptive Submodular Cover problem that achieves an approximation factor of .

## 4 Scenario Mixed Greedy

We now present a variant of Mixed Greedy that eliminates the dependence on in the approximation bound in favor of a dependence on , the size of the sample. We call this variant Scenario Mixed Greedy.

Scenario Mixed Greedy works by first modifying to produce a new utility function , and then running Mixed Greedy with , rather than . Utility function is produced by combining with another utility function , using the standard OR construction described at the end of Section 2. Here , where and . Thus is the total number of assignments that have been eliminated from because they are incompatible with the partial state information in . Utility for is achieved when all assignments in have been eliminated. Clearly, is monotone and submodular.

When the OR construction is applied to combine and , the resulting utility function reaches its goal value when all possible realizations of the sample have been eliminated or when goal utility is achieved for .

In an on-line setting, Scenario Mixed Greedy uses the following procedure to determine the adaptive sequence of items to choose on an initially unknown realization .

Scenario Mixed Greedy:

1. Construct utility function by applying the standard OR construction to and utility function .

2. Adaptively choose a sequence of items by running Mixed Greedy for utility function with goal value , with respect to the sample distribution .

3. After goal value is achieved, if the final partial realization computed by Mixed Greedy does not satisfy , then choose the remaining items in in a fixed but arbitrary order until .

The third step in the procedure is present because goal utility must be reached for even on realizations that are not in .

###### Theorem 2

Scenario Mixed Greedy is an approximation algorithm for the Scenario Submodular Cover problem that achieves an approximation factor of , where is the size of sample .

Proof  Scenario Mixed Greedy achieves utility value for when run on any realization , because the computed by Mixed Greedy is such that , and the third step ensures that is reached.

Let and denote the expected cost of the optimal strategies for the Scenario SC problems on and respectively, with respect to the sample distribution . Let be an optimal strategy for achieving expected cost . It is also a valid strategy for the problem on , since it achieves goal utility for on all realizations, and hence achieves goal utility for on all realizations. Thus .

The two functions, and , are monotone and submodular. Since the function is produced from them using the standard OR construction, is also monotone and submodular. Let be the value of parameter for the function . By the bound in Theorem 1, running Mixed Greedy on , for the sample distribution , has expected cost that is at most a factor more than . Its expected cost is thus also within an factor of . Making additional choices on realizations not in , as done in the last step of Scenario Mixed Greedy, does not affect the expected cost, since these realizations have zero probability.

Generalizing an argument from Cicalese et al. (2014), we now prove that is lower bounded by a constant fraction. Consider any and such that , and any where . Let . Since the sets and and and are disjoint, it is not possible for both of them to have size greater than . It follows that or or both. By the construction of , it immediately follows that or or both. Since is the “worst-case” setting for with respect to , it follows that , and so in all cases . Also, . Therefore, . The theorem follows from the bound given in Theorem 1.

Scenario Adaptive Greedy works by first constructing a utility function , produced by applying the standard OR construction to and utility function . Here , where . Intuitively, is the total weight of assignments that have been eliminated from because they are incompatible with the partial state information in . Utility is achieved for when all assignments in have been eliminated. It is obvious that is monotone and submodular. The function reaches its goal value when all possible realizations of the sample have been eliminated or when goal utility is achieved for . Once is constructed, Scenario Adaptive Greedy runs Adaptive Greedy on .

In an on-line setting, Scenario Adaptive Greedy uses the following procedure to determine the adaptive sequence of items to choose on an initially unknown realization .

1. Construct modified utility function by applying the standard OR construction to and utility function .

2. Run Adaptive Greedy for utility function with goal value , with respect to sample distribution , to determine the choices to make on .

3. After goal value is achieved, if the partial realization representing the states of the chosen items of does not satisfy , then choose the remaining items in in arbitrary order until .

In Appendix C, we prove the following lemma.

###### Lemma 3

Utility function is adaptive submodular with respect to sample distribution .

The consequence of Lemma 3 is that we may now use any algorithm designed for adaptive submodular utility functions. This gives us Theorem 4.

###### Theorem 4

Scenario Adaptive Greedy is an approximation algorithm for the Scenario Adaptive Submodular Cover problem that achieves an approximation factor of , where is the sum of the weights on the realizations in .

Proof  Since is produced by applying the OR construction to and , which are both monotone, so is . By Lemma 3, is adaptive submodular with respect to the sample distribution. Thus by the bound of Golovin and Krause on Adaptive Greedy, running that algorithm on yields an ordering of choices with expected cost that is at most a factor more than the optimal expected cost for . By the analogous argument as in the proof of Theorem 2, it follows that Scenario Adaptive Greedy solves the Scenario Submodular Cover problem for , and achieves an approximation factor of .

## Acknowledgments

L. Hellerstein thanks Andreas Krause for useful discussions at ETH, and especially for directing our attention to the bound of Streeter and Golovin for min-sum submodular cover.

## References

• Bellala et al. (2012) G. Bellala, S. Bhavnani, and C. Scott. Group-based active query selection for rapid diagnosis in time-critical situations. IEEE Transactions on Information Theory, 2012.
• Ben-Dov (1981) Y. Ben-Dov. Optimal testing procedure for special structures of coherent systems. Management Science, 1981.
• Chang et al. (1990) M.-F. Chang, W. Shi, and W. K. Fuchs. Optimal diagnosis procedures for -out-of- structures. IEEE Transactions on Computers, 39(4):559–564, April 1990.
• Chen et al. (2015a) Yuxin Chen, Shervin Javdani, Amin Karbasi, J. Andrew Bagnell, Siddhartha S. Srinivasa, and Andreas Krause. Submodular surrogates for value of information. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA., pages 3511–3518, 2015a.
• Chen et al. (2015b) Yuxin Chen, Shervin Javdani, Amin Karbasi, J. Andrew Bagnell, Siddhartha S. Srinivasa, and Andreas Krause. Submodular surrogates for value of information (long version). 2015b.
• Cicalese et al. (2014) Ferdinando Cicalese, Eduardo Laber, and Aline Medeiros Saettler. Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost. In Proceedings of The 31st International Conference on Machine Learning, pages 414–422, 2014.
• Deshpande et al. (2014) A. Deshpande, L. Hellerstein, and D. Kletenik. Approximation algorithms for stochastic boolean function evaluation and stochastic submodular set cover. In Symposium on Discrete Algorithms, 2014.
• Golovin and Krause (2011) D. Golovin and A. Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 42:427–486, 2011.
• Golovin et al. (2010) D. Golovin, A. Krause, and D. Ray. Near-optimal Bayesian active learning with noisy observations. In 24th Annual Conference on Neural Information Processing Systems (NIPS), pages 766–774, 2010.
• Guillory and Bilmes (2011) Andrew Guillory and Jeff A. Bilmes. Simultaneous learning and covering with adversarial noise. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 369–376, 2011.
• Javdani et al. (2014) Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, Drew Bagnell, and Siddhartha S. Srinivasa. Near optimal bayesian active learning for decision making. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014, pages 430–438, 2014.
• Kaplan et al. (2005) H. Kaplan, E. Kushilevitz, and Y. Mansour. Learning with attribute costs. In Symposium on the Theory of Computing, pages 356–365, 2005.
• Salloum (1979) S. Salloum. Optimal testing algorithms for symmetric coherent systems. PhD thesis, University of Southern California, 1979.
• Salloum and Breuer (1984) S. Salloum and M. Breuer. An optimum testing algorithm for some symmetric coherent systems. Journal of Mathematical Analysis and Applications, 101(1):170 – 194, 1984. ISSN 0022-247X.
• Skutella and Williamson (2011) Martin Skutella and David P. Williamson. A note on the generalized min-sum set cover problem. Operations Research Letters, 39(6):433 – 436, 2011.
• Streeter and Golovin (2009) Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. In Advances in Neural Information Processing Systems, pages 1577–1584, 2009.
• Ünlüyurt (2004) Tonguç Ünlüyurt. Sequential testing of complex systems: a review. Discrete Applied Mathematics, 142(1-3):189–205, 2004.
• Wolsey (1982) Laurence Wolsey. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research, 7(3):410–425, 1982.

## Appendix A Proof of Bound for Mixed Greedy

We first discuss the algorithm of Wolsey used in FindBudget.

### a.1 Wolsey’s Greedy Algorithm for Budgeted Submodular Cover

The Budgeted Submodular Cover problem takes as input a finite set of items, a positive integer called the budget, a monotone submodular set function , and a vector indexed by the items in , such that for all . The problem is to find a subset such that , and is maximized.

Wolsey (1982) developed a greedy approximation algorithm for this problem. We present the pseudocode for this algorithm here, together with Wolsey’s approximation bound.

###### Lemma 5 (Wolsey (1982))

Let be the optimal solution to the Budgeted Submodular Cover problem on instance . Let be the set of items chosen by running Wolsey-Greedy(). Let be the base of the natural logarithm, and let be the solution to . Then .

### a.2 Analysis of Mixed Greedy

Consider a Scenario SC instance , and a partial realization . We now consider MixedGreedy(). It constructs a tree for the Scenario SC instance induced by . In this induced instance, the item set is . Without loss of generality, assume that for some . For such that , define be the restriction of to the items in . For , denotes the extension to all elements in such that for and otherwise.

The utility function for the instance induced by is a function on partial realizations of the items in . Specifically, for , . The sample in the induced instance consists of the restrictions of the realizations in to the items in . That is, . Note that each realization in corresponds to a unique realization in . The weight function for the induced instance is such that for all , . The goal value for the induced instance is .

If , then MixedGreedy() returns the optimal tree for the instance induced by , which is a single (unlabeled) leaf with expected cost 0. Assume .

For any decision tree for the induced instance and any realization defined over the item set (or over any superset of ), let , where is the set of items labeling the nodes on the root-leaf path followed in on realization . That is, is the cost incurred when using tree on realization .

Let be a decision tree that is an optimal solution for the induced instance. Let where is a random realization drawn from . Thus is the expected cost of an optimal solution to the induced instance. Let denote the tree output by running MixedGreedy().

Let be such that for , . Thus, is the realization whose entries are computed in Step 4 of MixedGreedy.

For each node in the tree , let denote the probability that node will be reached when using on a random realization drawn from . Let where is the item labeling node . Consider the backbone constructed during the call to MixedGreedy(). The backbone consists of the nodes created during the two repeat loops in this call, excluding the recursive calls. Let be the set of nodes in the backbone. Let . Thus is the contribution of the nodes in the backbone to the expected cost of tree . The following lemma says that this contribution is no more than a constant times the expected cost of the optimal tree .

###### Lemma 6

.

Lemma 6 is the key technical lemma in our analysis, and it is the proof of this lemma that constitutes the major difference between our analysis and the analysis in Cicalese et al. (2014). We defer the proof of this lemma to Section A.3. Using this lemma, it is easy to generalize the rest of the analysis of Cicalese et al. to obtain the proof of Theorem 1. The proofs in the remainder of this section closely follow the proofs in Cicalese et al. We present them so that this paper will be self-contained.

Let be the budget that is computed in Line 6, with FindBudget, when running MixedGreedy(). Recall the constant defined in FindBudget, based on the bound on Wolsey’s Greedy algorithm (Lemma 5).

###### Lemma 7

The condition at the end of the first repeat loop (spent ) will be satisfied. Also, .

Proof  Trees and must achieve utility on realization . The binary search procedure in FindBudget finds the least budget allowing Wolsey’s greedy algorithm to achieve a total increase in utility of at least , on realization . It follows from the bound on Wolsey’s greedy algorithm (Lemma 5) that on realization , an increase of could not be achieved with a budget smaller than . Thus, .

The next lemma clearly holds because in the two repeat loops, we only consider items of cost at most , and we continue choosing items of cost at most until a budget of is met or exceeded.

###### Lemma 8

.

Let denote the final value of in the last recursive call, in Line 32, when running MixedGreedy().

###### Lemma 9

.

Proof  Recall that . For any , let denote the extension of , to , such that (as specified in line 4 of MixedGreedy()) for , and otherwise.

It follows from the way that was computed in FindBudget, and the fact that the value of is on any (full) realization of the items in , that there is a subset such that and .

Let and be the set of items chosen in the first and second repeat loops respectively. Thus .

Let represent the utility gained in the first repeat loop. Let represent the additional utility that the items in would provide. Since and is monotone, , and thus . So . At the end of the first repeat loop the items in have been chosen. If we were to add the items in to those in , it would increase the utility by . Since the items in the second repeat loop are chosen greedily with respect to (and ) until budget is met or exceeded, or goal value is attained, it follows by the approximation bound on Wolsey’s algorithm (Lemma 5) that the amount of additional utility added during the second repeat loop is at least times the amount of additional utility that would be added by instead choosing the items in . We thus have . Adding to both sides, from the definition of we get . We know from above that so we have . The lemma follows because the constant is greater than .

We can now give the proof of Theorem 1, stating that the Mixed Greedy algorithm achieves an approximation factor of .

Proof of Theorem 1  The Mixed Greedy algorithm solves the Scenario SC instance by running recursive function MixedGreedy(). In the initial call, is set to .

Let denote the tree that is output by running MixedGreedy(). Let denote the optimal tree for the Scenario SC instance induced by .

The expected cost of can be broken into the part that is due to costs incurred on items in the backbone in the top-level call to the MixedGreedy function, and costs incurred in the subtrees built in the recursive calls to MixedGreedy. The recursive calls in Steps 18 and 28 build subtrees of that are rooted at a -child of a node labeled , such that . It follows from the definition of that the value of the partial realization used in each of these recursive calls, is such that , so ,

The remaining recursive call is performed on , and by Lemma 9, .

Let . Let denote the partial realizations on which the recursive calls are made, and for which the value of on the partial realization is strictly less than . These are the recursive calls which result in the construction of non-trivial subtrees, with non-zero cost. Note that may include . For all , , or equivalently

 Q−g(bj)≤(1−η)(Q−g(b)) (1)

For , let denote the tree returned by the recursive call on .

Let be the sample for the Scenario SC instance induced by , so . Let be the weight function for that induced instance. Let . Let denote an optimal decision tree for the Scenario SC instance induced by . Consider the optimal decision tree for the instance induced by , and use it to form a decision tree for the instance induced by as follows: for each item such that