Online Pandora’s Boxes and Bandits

Online Pandora’s Boxes and Bandits

Hossein Esfandiari
Google Research
New York, NY &MohammadTaghi HajiAghayi
University of Maryland
Department of Computer Science
College Park, MD &Brendan Lucier
Microsoft Research
Cambridge, MA \ANDMichael Mitzenmacher
Harvard University
School of Engineering and Appplied Science
Cambridge, MA
esfandiari@google.comhajiagha@cs.umd.edubrlucier@microsoft.commichaelm@eecs.harvard.edu
Abstract

We consider online variations of the Pandora’s box problem (?), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)111See section 2 for formal definitions.. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.

Online Pandora’s Boxes and Bandits


Hossein Esfandiari thanks: esfandiari@google.com Google Research New York, NY                        MohammadTaghi HajiAghayi thanks: hajiagha@cs.umd.edu University of Maryland Department of Computer Science College Park, MD                        Brendan Lucier thanks: brlucier@microsoft.com Microsoft Research Cambridge, MA

Michael Mitzenmacherthanks: michaelm@eecs.harvard.edu Harvard University School of Engineering and Appplied Science Cambridge, MA

Copyright © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

1 Introduction

Information learning costs play a large role in a variety of markets and optimization tasks. For example, in the academic job market, obtaining information about a potential match is a costly investment for both sides of the market. Conserving on information costs is an important component of efficiency in such settings.

A classic model for information learning costs is the Pandora’s box problem, attributed to Weitzman (?), which has the following form. Pandora has boxes, where the th box contains a prize of value that has a known cumulative distribution function . It costs to open the th box and reveal the actual value . Pandora may open as many boxes as she likes, in any order. The payoff is the maximum-valued prize, minus the cost of the opened boxes. That is, if is the subset of opened boxes, then the payoff Pandora seeks to maximize is

The Pandora’s box problem incorporates two key decision aspects: the ordering for opening boxes, and when to stop. It has been proposed for applications such as buying or selling a house and searching for a job.

The original Pandora’s box problem has a simple and elegant solution. The reservation price associated with an unopened box is the value for which Pandora would be indifferent taking a prize with that value and opening box . That is,

This result says that if Pandora is allowed to choose the ordering, Pandora should keep opening boxes in the order of decreasing reservation price, but should stop searching when the largest prize value obtained exceeds the reservation price of all unopened boxes. An alternative proof to Weitzman’s proof (?) of this was recently provided by Kleinberg, Waggoner, and Weyl (?), who also present additional applications, including to auctions. Very recently Singla (?) generalizes the approach of Kleinberg et al. (?) for more applications in offline combinatorial problems such as matching, set cover, facility location, and prize-collecting Steiner tree.

In other similar problems, the ordering is chosen adversarially and adaptively. For example, in the prophet inequality setting first introduced in 1977 by Garling, Krengel, and Sucheston (??), the boxes have no cost, and the prize distributions are known, but the decision-maker has to decide after each successive box whether to stop the process and keep the corresponding prize; if not, the prize cannot be claimed later. It is known that there exists a threshold-based algorithm that in expectation obtains a prize value within a factor of two of the expected maximum prize (and the factor of two is tight) (??). There have subsequently been many generalizations of the prophet inequality setting, especially to applications in online auctions (see e.g. (?????????????????)).

Another related and well-studied theme includes multi-armed bandit problems and more generally reinforcement learning (see, e.g., (?)). In this setting, each “box” corresponds to a strategy, or arm, that has a payoff in each round. An online algorithm chooses one arm from a set of arms in each round over rounds. Viewed in the language of selection problems, this translates to a feasibility constraint on the set of boxes that can be opened. Multi-armed bandit problems have applications including online auctions, adaptive routing, and the theory of learning in games.

In this paper, we consider a class of problems that combine the cost considerations of Pandora’s box with the online nature of prophet inequality problems. Again boxes are presented online, here with random values and costs drawn jointly from some distribution. Pandora chooses online whether to open each box, and then whether to keep it or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies in deciding which boxes to open, without knowledge of the value inside. We consider variations where Pandora can collect multiple prizes subject to sets of constraints. For example, Pandora may be able to keep at most prizes, the selected prizes must form an independent set in a matroid, or the prizes might have associated weights that form a knapsack constraint. We also introduce variations related to classic multi-armed bandit problems and reinforcement learning, where there are feasibility constraints on the set of boxes that can be opened. Our work shows that in many scenarios even without the power of ordering choices, Pandora can achieve a good approximation to the best possible performance.

Our main result is a reduction from this general class of problems, which we refer to as online Pandora’s box problems, to the problem of finding threshold-based algorithms for the associated prophet inequality problems where all costs are zero. Our reduction is constructive, and results in a polynomial-time policy, given a polynomial-time algorithm for constructing thresholds in the reduced problem. We first describe the reduction in Section 2. Then in Section 3, we show how to use known results from the prophet inequality literature to directly infer solutions to online Pandora’s box problems. Finally, in Section 4, we establish an algorithm for a multi-armed bandit variant of the online Pandora’s box problem, by proving a novel multi-armed prophet inequality.

2 Pandora’s Boxes Under General Constraints

In this section we consider a very general version of an online Pandora’s box problem, with the goal of showing that, if there is a suitable corresponding prophet inequality algorithm, we can use it in a way that yields good approximation ratios for the Pandora’s box problem. We define the problem as follows. There is a sequence of boxes that arrive online, in an order chosen by an adversary (i.e., worst case order). Each box has a cost , a value , and a type . The tuple is drawn from a joint distribution . The distributions are known in advance. When a box is presented, we observe its type . We can then choose whether to open the box. We note that, given the type , and have conditional distributions depending on the type . There is a set of constraints dictating what combinations of boxes can be opened; these constraints can depend on the indexes of the boxes and their types. If we open the box, then and are revealed, and we pay for opening the box. We must then choose (irrevocably) whether to keep and collect . There is also a set of constraints dictating what combinations of values can be kept; these constraints can depend on the indexes of the boxes and their types. We indicate the set of opened boxes by and the set of of kept boxes by . The goal is to maximize the expected utility .

One might want to consider an adversary that obtains , that is, a fully clairvoyant adversary. However, it is not possible to provide any competitive algorithm against such an adversary even for the simple classical Pandora’s box problem.222Consider the following example with identical (and independent) boxes where we have no constraint on opening boxes and can accept exactly one box at the end. The value of each box is with probability and with probability ; the cost of each box is . Note that the cost of each box is equal to its expected value. Hence, the expected utility of any online algorithm is upper bounded by . However, with probability at least one of the boxes has a value . A fully clairvoyant adversary only opens the box with value and obtain a utility . This is a positive utility when .

We denote the expected utility of an algorithm by . We compare our algorithms against the (potentially exponential time) optimal offline algorithm that maximizes the expected utility. Specifically, can see all the types of all the boxes, and can choose to open boxes in any order. However, does not learn the resulting cost and value and for a box until it is opened. iteratively and adaptively opens boxes and at the end chooses a subset of opened boxes to keep. Of course must respect the constraints on opened boxes and kept boxes.

We first prove some fundamental lemmas that capture important structure for this Pandora’s box problem. We then use these lemmas to provide a strong connection between the online Pandora’s box problem and prophet inequalities. Our results allow us to translate several prophet inequalities algorithms, such as prophet inequalities under capacity constraints, matroid constraints, or knapsack constraints, to algorithms for online Pandora’s box algorithms under the same constraints.

2.1 Fundamental Lemmas

Our lemmas allow us flexibility in considering the distribution of costs, and show how we can preserve approximation ratios.

Definition 1

Let , and be two sequences of boxes. Denote the outcomes of and by and respectively. We say the two sequences are cost-equivalent if (a) they can be coupled so that for all we have and , and (b) for all .

Lemma 2

Let , and be two cost-equivalent sequences of boxes. Let be an online (resp., offline) algorithm that achieves an expected utility on boxes . There exists an online (resp., offline) algorithm that achieves the same expected utility on boxes .

Proof :   We will first suppose that is online, so that the order of arrival is predetermined and the types are revealed online. We will define algorithm using the run of algorithm on a simulated set of boxes . When attempts to open a box we do the following. We open , and let be the outcome. Then we draw a triple from , conditioning on and , and report it to . then opens the box if and only if does, and likewise keeps the box if and only if does.

Let be a binary random variable that is if opens box and otherwise. Also, let be a binary random variable that is if keeps box and otherwise. Note that for any particular , at the time that the algorithm decides about , is unknown to the algorithm. Moreover, may be correlated with , but is independent of all observations of the algorithm from prior rounds. Therefore, after conditioning on , is independent of . We have

The case where is an offline algorithm is similar. The only difference is that the full profile of types is known to the algorithm in advance, and hence and can depend on this profile. We therefore fix the type profile, interpret variables and as being conditioned on this realization of the types, interpret all expectations with respect to this conditioning, and the argument proceeds as before (noting that the distribution of can depend on , but is independent of other types). Note that this actually simplifies the chain of inequalities above, as the conditioning on on the third line is trivial and unnecessary when is fixed.

Lemma 3

Let , and be two cost-equivalent sequences of boxes. Let be an online (resp., offline) -approximation algorithm on boxes . There exists an online (resp., offline) -approximation algorithm on boxes .

Proof :   Let and be the optimum (offline) algorithms for boxes and respectively. Let be the algorithm of Lemma 2 applied to . Moreover, note that Applying to Lemma 2 implies that there is some offline algorithm on boxes such that . We bound the approximation factor of as follows.

By Lemma 2
definition of
is an -approximation algorithm

Next we define the commitment Pandora’s box problem on boxes . Commitment Pandora’s box is similar to the Pandora’s box problem with the following two restrictions, that we refer to as freeness and commitment, respectively.

  • Freeness: Opening any box is free, i.e., for all we have .

  • Commitment: If a box is opened and the value is the maximum possible value of , is kept.

Note that the commitment constraint is without loss of generality for an online algorithm, but is a non-trivial restriction for an offline algorithm.

For each , and for each type , we define a threshold so that we have . If , we set to the supremum of the . (It is possible to have be infinity, with the natural interpretation.) We define in the following lemma.

Theorem 4

Let be an -approximation algorithm for the commitment Pandora’s box problem on boxes . There exists an -approximation algorithm for the Pandora’s box problem on .

Proof :   First we define a sequence of boxes , where for all we have . That is, we set , , and . Note that by definition of we have for all . Thus, by Lemma 3 an (online) -approximation algorithm for the Pandora’s box problem on implies an -approximation algorithm for the Pandora’s box problem on as desired. Next, we construct required by Lemma 3 using . To construct , whenever attempts to open a box , we open and report to . keeps the same set of boxes as .

Let be a binary random variable that is if opens box and otherwise. Also, let be a binary random variable that is if keeps box and otherwise. Note that , and achieves it maximum value whenever . In this case by the commitment constraint we have . Therefore we either have or , which gives us

(1)

Then for any fixed profile of types, and taking expectations conditional on those type realizations, we have

By def.
Eq. (1)

where the first equality is since opens and keeps the same sets as . Similarly, we can show , where is the optimum algorithm for the Pandora’s box problem on and is the optimum algorithm for the commitment Pandora’s box problem on . Therefore is an -approximation algorithm for the Pandora’s box problem on as promised.

2.2 A Reduction for Online Pandora’s Box Problems

In this section we use Theorem 4 to provide a strong connection between the online Pandora’s box problem under general constraints and prophet inequalities. This leads to our main result: we prove that a threshold-based algorithm for a prophet inequality problem, under any given feasibility constraints on boxes that can be opened and/or prizes that can be kept, immediately translates into an algorithm for the online Pandora’s box problem. This reduction preserves the approximation factor of the threshold-based algorithm. This implies several approximation algorithms for the online Pandora’s box problem under different constraints, which we discuss in Section 3.

Recall that, in the online Pandora’s Box problem, the algorithm is permitted to keep a set of boxes and collect the reward, where and are restricted to be from arbitrary predefined collections of feasible collections and . In the associated prophet inequality problem, the costs of all boxes are known to be . In Theorem 5 below, we use the notion of threshold-based algorithms for the prophet inequality problem defined as follows. We say an algorithm is threshold-based if for every we have a threshold (where the threshold can depend on the type as well as the index) and the algorithm keeps a box if and only if it not less than . The threshold may be adaptive, that is it may depend on any observation prior to observing the th box.

Theorem 5

Let be a threshold-based -approximation algorithm for the prophet inequalities problem, under a collection of constraints. There exists an -approximation algorithm for the online Pandora’s box problem under the same constraints.

Proof :   We define . Next we give an -approximation algorithm for the commitment Pandora’s box problem on boxes . This together with Theorem 4 will prove the theorem.

Let be the threshold function used by given observed values . We define as follows. Upon arrival of box , first we check if . Note that this implicitly implies that the box is acceptable according to the constraints. If we open the box, otherwise we skip it. This ensures the commitment constraint. If we opened the box and we keep it, otherwise we ignore it and continue. It is easy to observe that and provide the same outcome and have the same approximation factor.

3 Algorithms For Online Pandora’s Box via Prophet Inequalities

Here we use the tools from the previous section to provide algorithms for the online Pandora’s box problem under different kinds of constraints. First, as a warm-up, in Theorem 6 we show a -approximation algorithm for a simple version of the problem where there are no constraints on the set of boxes that we can open, and we can only keep the value of one box. Note that this is the online version of the classical Pandora’s box problem. Indeed it is known that there is no approximation algorithm for this problem even if all of the costs are (where the problem is equivalent to a basic prophet inequalities problem) (?). We will prove Theorem 6 directly, without appealing to Theorem 5, to provide insight into how the given thresholds translate into a policy for the Pandora’s Box problem.

Theorem 6

There exists a -approximation algorithm for the online Pandora’s box problem with no constraints on opening boxes, but the value of exactly one box is kept.

Proof :   We define . Next we give a simple -approximation algorithm for the commitment Pandora’s box problem on boxes . This together with Theorem 4 proves the theorem.

Set such that . Let be a random variable that indicates the first index such that . Let , if there exists such index , and let otherwise. It is known that (?).

We define as follows. Upon arrival of box , first we check if . If it is we open the box, otherwise we skip it. This ensures the commitment constraint; that is, if we observe , we will accept it. Next, if we opened the box and we keep it and terminate. Otherwise we continue to the next box. It is easy to observe that keeps , and hence is a -approximation algorithm.

We now explore other applications of our reduction. Theorem 5, together with previously-known approximation algorithms for prophet inequality problems with various types of constraints, implies the existence of approximation algorithms for variations of the online Pandora’s box problem. Specifically, we have the following variations:

  • Online -Pandora’s box problem: we are given a cardinality , and at most boxes can be kept.

  • Online knapsack Pandora’s box problem: we have a capacity , and the type of each box corresponds to a size. The total size of the boxes that can be kept is at most .

  • Online matroid Pandora’s box problem: we have a matroid constraint on the set of boxes, and the boxes that are kept must be an independent set of the matroid.

We note that all of the variations above have no constraints on opening boxes; however, in what follows, we study a variation of the problem with constraints on opening boxes.

As we have mentioned, prophet inequality approximation algorithms for the settings of cardinality constraints (?), knapsack constraints (?), and matriod constraints (?) exist. Making use of these results and Theorem 5 implies the following corollary.

Corollary 7

There is

  • a -approximation algorithm for the online -Pandora’s box problem,

  • a -approximation algorithm for the online knapsack Pandora’s box problem,

  • and a -approximation algorithm for the online matroid Pandora’s box problem.

4 Pandora’s Box with Multiple Arms

In the context reinforcement learning, we next consider a multi-arm version of the Pandora’s box problem. In each of rounds, boxes are presented. There are therefore boxes in total. At most one box can be opened in each round. The boxes presented in a given round are ordered; we can think of each box as having a type labeled through . All boxes of type have the same cost , and also have the same value distribution . For notational convenience we’ll write for the value in the box of type presented at time ; for convenience we assume . At the end of the rounds, the player can keep at most one prize for each type of box. That is, if we write for the subset of boxes of type opened by the player, then the objective is to maximize

If no box of type is opened, we’ll define (i.e., the prize for that type) to be 0. This is a variant of the online Pandora’s box problem with a (partition) matroid constraint on the set of prizes that can be kept, and also a constraint on the subsets of boxes that can be opened.

We can think of this problem as presenting boxes one at a time, where the boxes from a round are presented sequentially, with the additional constraint that one box per round can be selected. Note that types here are not random, but would depend on the index of the box. Our previous reduction applies, so that solving this multi-arm Pandora’s box problem reduces to developing the related prophet inequality. In this prophet inequality, boxes can be opened at no cost, but we must irrevocably choose whether or not to keep any given prize as it is revealed. We can still keep at most one prize of each type, and we can still open at most one box in each time period. Our question becomes: can we develop a threshold policy to achieve a constant-factor prophet inequality for this setting? In this case, a threshold policy corresponds to choosing a threshold for each box type , and accepting a prize from an opened box if and only if .

What is an appropriate benchmark for the prophet inequality? Note that the sum of the best prizes of each type, ex post, might not be achievable by any policy due to the restriction on which boxes can be opened. We will therefore compare against the following weaker benchmark. We consider a prophet who must choose, in an online fashion, one box to open in each round, given knowledge of the prizes in previously opened boxes. Then, after having opened one box on each of the rounds, the prophet can select the largest observed prize of each type. In other words, the prophet has the advantage of being able to choose from among the opened boxes in retrospect, but must still open boxes in an online fashion. Our goal is to obtain a constant approximation to the expected value enjoyed by such a prophet.

We begin with some observations about the choice of which box to open. First, the optimal policy for the prophet is to open boxes greedily. In particular, this policy can be implemented in time linear in , each round.

Lemma 8

In each round , it is optimal for the prophet to open a box of type that maximizes his expected value, as if the game were to end after time .

Proof :   Note that the expected marginal gain of opening a box of type can only decrease over time, and only as more boxes of that type are opened. Suppose is the box with maximum expected marginal value at time . Suppose further that the prophet does not open box , and furthermore opens no box of type until the final round . Then will be the box with maximum expected marginal value in round , and therefore it would be optimal to open the box of type on the last round. This implies that it is optimal to open at least one box of type , at some point between round and the end of the game. The prophet is therefore at least as well off by opening the box of type immediately, since doing so does not affect the distribution of the revealed value, and this can only provides more information for determining which other boxes to open. It is therefore (weakly) optimal to open box at time .

Similarly, once thresholds are fixed, an identical argument implies that the optimal threshold-based policy behaves greedily. In particular, the optimal policy can be implemented in polynomial time, given an arbitrary set of thresholds.

Lemma 9

Suppose the player’s policy is committed to selecting a prize of type , from an opened box of type , if and only if its value exceeds . Then, in each round , it is optimal to open a box , from among those types for which a prize has not yet been accepted, that maximizes his expected value as if the game were to end after time . That is, .

We now claim that there are thresholds that yield a -approximate prophet inequality for this setting. First, some notation. By the principle of deferred randomness, we can think of the value of the prize in any given box as only being determined at the moment the box is opened. With this in mind, we will write for the value observed in the ’th box of type opened by the decision-maker. For example, is the value contained in whichever box of type is opened first, regardless of the exact time at which it is opened. Note that the behavior of any online policy is fully described by the profile of values , and each is a value drawn independently from distribution . For such a profile , write for the indicator variable that is if the prophet opens at least boxes of type , and keeps the ’th one opened. Then the expected value enjoyed by the prophet is

For each box type , we will set the threshold

That is, is half of the expected value obtained by the prophet from boxes of type .

To prove that these thresholds achieve a good approximation to the prophet’s welfare, it will be useful to analyze the possible correlation between the number of boxes of type opened by the prophet, and whether any prize of type is kept by the threshold algorithm. To this end, write for the indicator that the prophet opens at least boxes of type . Note that for all , , and . We’ll also write for the indicator variable that is if for all . That is, if no box of type opened by the prophet has value greater than , and hence the threshold algorithm does not keep any prize of type . The following lemma shows that is positively correlated with , for every and .

Lemma 10

For all and , .

Proof :   Fix the values of for all . Note that this also fixes the choice of which box the prophet will open, on any round in which the prophet chooses not to open a box of type . Due to the prophet’s greedy method of opening boxes, at every time , the prophet will choose to open the box of type if and only if the maximum value from a box of type , seen so far (or if no box of type has been opened yet), is below a threshold determined by the values observed from the other boxes. In other words, the values for define a sequence of non-decreasing thresholds with the following property. Suppose, at time , the prophet has previously opened boxes of type , and has therefore opened boxes of types other than . Then the prophet will open box at time if and only if

(2)

Here and below, we’ll take the maximum of an empty set to be .

We now claim that the prophet opens or more boxes of type , at or before time , if and only if . We prove this claim by induction on . The case is immediate, since box is opened first if and only if . Now suppose . If , then at all times , so by (2) the prophet never chooses to open the ’th box of type . In the other direction, note that if , then , so by induction the prophet opens at least boxes by time . This means that either the prophet has already opened boxes of type before time (and we are done), or it has opened exactly boxes of type before time . In the latter case, since we have , we conclude from (2) that the prophet opens box at time , as required.

We are now ready to return to and . From the claim above, we have that if and only if . It suffices to show that this event is only more likely to occur if we condition on the event . We will actually consider a stronger event , which is that . Note that event is more stringent than the event , since event requires that all of the first values from box are at most , whether or not those boxes are opened. But since these events differ only on values that are in unopened boxes, we have , so it suffices to prove that is positively correlated with event .

To show that , we will couple outcomes with and without this conditioning on . To do so, we imagine first drawing a sequence , then re-drawing any values that are greater than until all values are at most ; say is the modified profile. Note that since for all , we have that if , then it must also be that . So the expected value of can only increase as a result of this transformation. Since is chosen uniformly from all profiles, and is distributed uniformly from profiles that satisfy event , we conclude that as required.

Finally, we can prove the multi-arm prophet inequality.

Theorem 11

The optimal threshold policy, using the thresholds described above, achieves at least half of the expected value enjoyed by the prophet for the multi-armed prophet inequality problem.

Proof :   As above, write for the indicator variable for if the prophet opens at least boxes of type and keeps the ’th one opened, and write for the indicator that the prophet opens at least boxes of type . We’ll show a -approximation for the policy that uses the given thresholds, but chooses to open the same boxes that the prophet would open. This will imply the theorem, since the optimal policy that uses these thresholds would do at least as well as the policy that opens the same boxes as the prophet.

Given this choice of what boxes to open, let be the indicator variable that is if the threshold algorithm opens at least boxes of type and keeps the ’th one opened. The total value obtained by the threshold algorithm is then

Note that the threshold algorithm might not choose a prize of every type, since it might be that all observed prizes of type are less than . Write for the indicator variable that is if none of the first (up to) boxes of type opened by the threshold algorithm are kept by the algorithm. We’ll also write for the indicator variable that is if no prize of type is kept by the threshold algorithm at any time. In particular, for all . Finally, we’ll write .

We decompose the value generated by the algorithm into (a) the value attributable to the thresholds, and (b) any value in excess of the thresholds. That is,

For the first term, we have

(3)

The second term is more interesting. We have

(4)

where the first inequality is linearity of expectation and the definition of , the equality on the second line uses the fact that is independent of the values that occur earlier (which determine and , and the inequality on the fourth line is Lemma 10. The result now follows by adding (4) and (4), yielding

as required.

We note that Theorem 11 is constructive, and the thresholds can be computed to within an arbitrarily small error in polynomial time. For example, this can be done by sampling instances of , simulating the behavior of the prophet, and calculating an empirical average of the associated threshold values. See (?) for further details on this approach.

Applying the reduction from Theorem 5 to the prophet inequality in Theorem 11 yields a polynomial time algorithm for the multi-arm online Pandora’s box problem.

Corollary 12

There is an -approximation algorithm for the online multi-arm Pandora’s box problem.

5 Conclusion and Open Problems

We have presented a general reduction method for translating approximation algorithms in the prophet inequality setting to corresponding approximation algorithms in the online Pandora box setting with applications to information learning. Further we have introduced a novel multi-armed bandit Pandora box variation in the context of reinforcement learning where our methods apply. Along the way, we have considered many generalizations of the Pandora box problem, including allowing distributions on costs and the use of types.

One open challenge is to relax the assumption that values, costs, and/or types of different boxes are independent. One could also generalize to objectives beyond maximizing the sum of the values selected minus the cost of opening boxes. For example, what happens if the cost of opening a box increases as more boxes are opened? Finally, the multi-arm prophet inequality is a special case of a more general class of stochastic optimization problems, and it would be interesting to extend to more general scenarios. For example, can one extend the result to distributions that vary across time, or to general matroid constraints over the set of boxes that can be opened?

Acknowledgments

Hossein Esfandiari was supported in part by NSF grants CCF-1320231 and CNS-1228598.

MohammadTaghi HajiAghayi was supported in part by NSF CAREER award CCF-1053605, NSF AF:Medium grant CCF-1161365, NSF BIGDATA grant IIS-1546108, NSF SPX grant CCF-1822738, UMD AI in Business and Society Seed Grant and UMD Year of Data Science Program Grant. Part of this work was done while visiting Microsoft Research New England.

Michael Mitzenmacher was supported in part by NSF grants CCF-1563710, CCF-1535795, CCF-1320231, and CNS-1228598; also, part of this work was done while visiting Microsoft Research New England.

References

  • [Alaei, Hajiaghayi, and Liaghat 2012] Alaei, S.; Hajiaghayi, M.; and Liaghat, V. 2012. Online prophet-inequality matching with applications to ad allocation. In ACM EC, 18–35.
  • [Alaei 2014] Alaei, S. 2014. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM Journal on Computing 43(2):930–972.
  • [Babaioff, Immorlica, and Kleinberg 2007] Babaioff, M.; Immorlica, N.; and Kleinberg, R. 2007. Matroids, secretary problems, and online mechanisms. In SODA, 434–443.
  • [Esfandiari et al. 2017] Esfandiari, H.; Hajiaghayi, M.; Liaghat, V.; and Monemizadeh, M. 2017. Prophet secretary. SIAM Journal on Discrete Mathematics 31(3):1685–1701.
  • [Feldman, Svensson, and Zenklusen 2015] Feldman, M.; Svensson, O.; and Zenklusen, R. 2015. A simple O (log log (rank))-competitive algorithm for the matroid secretary problem. In SODA, 1189–1201.
  • [Gittins and Jones. 1974] Gittins, J., and Jones., D. 1974. A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, 241–266.
  • [Goel and Mehta 2008] Goel, G., and Mehta, A. 2008. Online budgeted matching in random input models with applications to adwords. In SODA, 982–991.
  • [Guruganesh and Singla. 2017] Guruganesh, G. P., and Singla., S. 2017. Online matroid intersection: Beating half for random arrival. In IPCO, 241–253.
  • [Hajiaghayi, Kleinberg, and Parkes 2004] Hajiaghayi, M. T.; Kleinberg, R.; and Parkes, D. C. 2004. Adaptive limited-supply online auctions. In ACM EC, 71–80.
  • [Karande, Mehta, and Tripathi 2011] Karande, C.; Mehta, A.; and Tripathi, P. 2011. Online bipartite matching with unknown distributions. In STOC, 587–596.
  • [Kesselheim et al. 2013] Kesselheim, T.; Radke, K.; Tönnis, A.; and Vöcking, B. 2013. An optimal online algorithm for weighted bipartite matching and extensions to combinatorial auctions. In ESA, 589–600.
  • [Kleinberg and Weinberg 2012] Kleinberg, R., and Weinberg, S. M. 2012. Matroid prophet inequalities. In STOC, 123–136.
  • [Korula and Pál 2009] Korula, N., and Pál, M. 2009. Algorithms for secretary problems on graphs and hypergraphs. In ICALP. 508–520.
  • [Krengel and Sucheston 1977] Krengel, U., and Sucheston, L. 1977. Semiamarts and finite values. Bulletin of the American Mathematical Society.
  • [Krengel and Sucheston 1978] Krengel, U., and Sucheston, L. 1978. On semiamarts, amarts, and processes with finite value. Advances in Prob 4:197–266.
  • [Lachish 2014] Lachish, O. 2014. O (log log rank) competitive ratio for the matroid secretary problem. In FOCS, 326–335.
  • [Mahdian and Yan 2011] Mahdian, M., and Yan, Q. 2011. Online bipartite matching with random arrivals: an approach based on strongly factor-revealing lps. In STOC, 597–606.
  • [Mohammad Taghi Hajiaghayi and Sandholm. 2007] Mohammad Taghi Hajiaghayi, R. K., and Sandholm., T. 2007. Automated online mechanism design and prophet inequalities. In AAAI, 58–65.
  • [Paul Duetting and Lucier. 2017] Paul Duetting, Michal Feldman, T. K., and Lucier., B. 2017. Prophet inequalities made easy: Stochastic optimization by pricing non-stochastic inputs. In FOCS, 540–551.
  • [Robert Kleinberg and Weyl. 2016] Robert Kleinberg, B. W., and Weyl., E. G. 2016. Descending price optimally coordinates search. In ACM EC, 23–24.
  • [Samuel-Cahn 1984] Samuel-Cahn, E. 1984. Comparison of threshold stop rules and maximum for independent nonnegative random variables. In the Annals of Probability, 12(4):1213–1216.
  • [Singla. 2018] Singla., S. 2018. The price of information in combinatorial optimization. In SODA, 2523–2532.
  • [Weitzman. 1979] Weitzman., M. L. 1979. Optimal search for the best alternative. In Econometrica, 47(3):641–654.
  • [Yan 2011] Yan, Q. 2011. Mechanism design via correlation gap. In SODA, 710–719.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
334421
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description