Secretary Problems with Non-Uniform Arrival Order

For a number of problems in the theory of online algorithms, it is known that the assumption that elements arrive in uniformly random order enables the design of algorithms with much better performance guarantees than under worst-case assumptions. The quintessential example of this phenomenon is the secretary problem, in which an algorithm attempts to stop a sequence at the moment it observes the maximum value in the sequence. As is well known, if the sequence is presented in uniformly random order there is an algorithm that succeeds with probability , whereas no non-trivial performance guarantee is possible if the elements arrive in worst-case order.

In many of the applications of online algorithms, it is reasonable to assume there is some randomness in the input sequence, but unreasonable to assume that the arrival ordering is uniformly random. This work initiates an investigation into relaxations of the random-ordering hypothesis in online algorithms, by focusing on the secretary problem and asking what performance guarantees one can prove under relaxed assumptions. Toward this end, we present two sets of properties of distributions over permutations as sufficient conditions, called the -block-independence property and -uniform-induced-ordering property. We show these two are asymptotically equivalent by borrowing some techniques from the celebrated approximation theory. Moreover, we show they both imply the existence of secretary algorithms with constant probability of correct selection, approaching the optimal constant as the related parameters of the property tend towards their extreme values. Both of these properties are significantly weaker than the usual assumption of uniform randomness; we substantiate this by providing several constructions of distributions that satisfy -block-independence. As one application of our investigation, we prove that is the minimum entropy of any permutation distribution that permits constant probability of correct selection in the secretary problem with elements. While our block-independence condition is sufficient for constant probability of correct selection, it is not necessary; however, we present complexity-theoretic evidence that no simple necessary and sufficient criterion exists. Finally, we explore the extent to which the performance guarantees of other algorithms are preserved when one relaxes the uniform random ordering assumption to -block-independence, obtaining a positive result for Kleinberg’s multiple-choice secretary algorithm and a negative result for the weighted bipartite matching algorithm of Korula and Pál.

## 1 Introduction

A recurring theme in the theory of online algorithms is that algorithms may perform much better when their input is in (uniformly) random order than when the ordering is worst-case. The quientessential example of this phenomenon is the secretary problem, in which an algorithm attempts to stop a sequence at the moment it observes the maximum value in the sequence. As is well known, if the sequence is presented in uniformly random order there is an algorithm that succeeds with probability , whereas no non-trivial performance guarantee is possible if the elements arrive in worst-case order.

In many of the applications of online algorithms, it is reasonable to assume there is some randomness in the input sequence, but unreasonable to assume that the input ordering is uniformly random. It is therefore of interest to ask which algorithms have robust performance guarantees, in the sense that the performance guarantee holds not only when the input order is drawn from the uniform distribution, but whenever the input order is drawn from a reasonably broad family of distributions that includes the uniform one. In other words, we seek relaxations of the standard random-ordering hypothesis which are weak enough to include many distributions of interest, but strong enough to enable one to prove the same (or qualitatively similar) performance guarantees for online algorithms.

This work initiates an investigation into relaxations of the random-ordering hypothesis in online algorithms, by focusing on the secretary problem and asking what performance guarantees one can prove under relaxed assumptions. In the problems we consider there are three parties: an adversary that assigns values to items, nature which permutes the items into a random order, and an algorithm that observes the items and their values in the order specified by nature. To state our results, let us say that a distribution over permutations, is secretary-admissible (abbreviated s-admissible) if it is the case that when nature uses this distribution to sample the ordering of items, there exists an algorithm that is guaranteed at least a constant probability of selecting the element of maximum value, no matter what values the adversary assigns to elements. If this constant probability approaches as the number of elements, , goes to infinity, we say that the distribution is secretary-optimal (s-optimal).

Question 1: What natural properties of a distribution suffice to guarantee that it is s-admissible? What properties suffice to guarantee that it is s-optimal?

For example, rather than assuming that ordering of the entire -tuple of items is uniformly random, suppose we fix a constant and assume that for every -tuple of distinct items, the relative order in which they appear in the input sequence is -close to uniform. Does this imply that the distribution is s-admissible? In §2 we formalize this -uniform-induced-ordering property (UIOP), and we prove that it implies s-admissibility for and approaches s-optimality as and . To prove this, we relate the uniform-induced-ordering property to another property, the -block-independence property (BIP), which may be of independent interest. Roughly speaking, the block-independence property asserts that the joint distribution of arrival times of any distinct elements, when considered at coarse enough granularity, is -close to i.i.d. samples from the uniform distribution. While this property may sound much stronger than the UIOP, we show that it is actually implied by the UIOP for sufficiently large and small .

To substantiate the notion that these properties are satisfied by many interesting distributions that are far from uniform, we show that they apply to several natural families of permutation distributions, including almost every uniform distribution with support size , and the distribution over linear orderings defined by taking any sufficiently “incoherent” vectors and projecting them onto a random line.

A distinct but related topic in the theory of computing is pseudorandomness, which shares a similar emphasis on showing that performance guarantees of certain classes of algorithms are preserved when one replaces the uniform distribution over inputs with suitably chosen non-uniform distributions, specifically those having low entropy. While our interest in s-admissibility and the -UIOP is primarily motivated by the considerations of robustness articulated earlier, the analogy with pseudorandomness prompts a natural set of questions.

Question 2: What is the minimum entropy of an s-admissible distribution? What is the minimum entropy of a distribution that satisfies the -UIOP? Is there an explicit construction that achieves the minimum entropy?

In §2 and §3 we supply matching upper and lower bounds to answer the first two questions. The answer is the same in both cases, and it is surprisingly small: bits. Moreover, bits suffice not just for s-admissibility, but for s-optimality! We also supply an explicit construction, using Reed-Solomon codes, of distributions with bits of entropy that satisfy all of these properties.

Given that the -UIOP is a sufficient condition for s-admissibility, that it is satisfied in every natural construction of s-admissible distributions that we know of, and that the minimum entropy of -UIOP distributions matches the minimum entropy of s-admissible distributions, it is tempting to hypothesize that the -UIOP (or something very similar) is both necessary and sufficient for s-admissibility.

Question 3: Find a natural necessary and sufficient condition that characterizes the property of s-admissibility.

In §4 we show that, unfortunately, this is probably impossible. We construct a strange distribution over input orderings that is s-admissible, but any algorithm achieving constant probability of correct selection must use a stopping rule that cannot be computed by circuits of size . The construction makes use of a coding-theoretic construction that may be of independent interest: a binary error-correcting code of block length and message length , such that if one erases any symbols of the received vector, most messages can still be uniquely decoded even if of the remaining symbols are adversarially corrupted.

Finally, we broaden our scope and consider other online problems with randomly-ordered inputs.

Question 4: Are the performance guarantees of other online algorithms in the uniform-random-order model (approximately) preserved when one relaxes the assumption about the input order to the -UIOP or the -BIP? If the performance guarantee is not always preserved in general, what additional properties of an algorithm suffice to ensure that its performance guarantee is preserved?

This is an open-ended question, but we take some initial steps toward answering it by looking at two generalizations of the secretary problem: the multiple-choice secretary problem (a.k.a. the uniform matroid secretary problem) and the online bipartite weighted matching problem. We show that the algorithm of Kleinberg [25] for the former problem preserves its performance guarantee, and the algorithm of Korula and Pál [26] for the latter problem does not.

##### Related Work.

The secretary problem was solved by Lindley [28] and Dynkin [15]. A sequence of papers relating secretary problems to online mechanism design [20, 25, 5] touched off a flurry of CS research during the past 10 years. Much of this research has focused on the so-called matroid secretary problem, which remains unsolved despite a string of breakthroughs including a recent pair of -competitive algorithms [27, 18], where is the matroid rank. Generalizations are known for weighted matchings in graphs and hypergraphs [14, 23, 26], independent sets [19], knapsack constraints [4], and submodular payoff functions [7, 17], among others. Of particular relevance to our work is the free order model [21]; our results on the minimum entropy s-admissible distribution can be regarded as a randomness-efficient secretary algorithm in the free-order model.

The uniform-random-ordering hypothesis has been applied to many other problems in online algorithms, perhaps most visibly to the AdWords problem [12, 16] and its generalizations to online linear programming with packing constraints [2, 13, 24, 32], and online convex programming [1]. Applications of the random-order hypothesis in minimization settings are more rare; see [29, 30] for applications in the context of facility location and network design.

In seeking a middle ground between worst-case and average-case analysis, our work contributes to a broad-based research program going by the name of “beyond worst-case analysis” [34]. In terms of motivation, there are clear conceptual parallels between our paper and the work of Mitzenmacher and Vadhan [31], who study hashing and identify hypotheses on the data-generating process, much weaker than uniform randomness, under which random hashing using a 2-universal hash family has provably good performance, although at a technical level our paper bears no relation to theirs.

The properties of permutation distributions that we identify in our work bear a resemblance to almost -wise independent permutations (e.g., [22]), but the -UIOP and -BIP are much weaker, and consequently permutation distributions satisfying these properties are much more prevalent than almost -wise independent permutations.

##### Setting and Notations.

We consider problems in which an algorithm selects one or more elements from a set of items. Items are presented sequentially, and an algorithm may only select items at the time when they are presented. In the secretary problem the items are totally ordered by value, and the algorithm is allowed to select only one element of the input sequence, with the goal of selecting the item of maximum value. Algorithms for the secretary problem are assumed to be comparison-based111This assumption of comparison-based algorithms is standard in the literature on secretary problems. Samuels [35] proved that when the input order is uniformly random, it is impossible to achieve probability of correct selection for any constant , even if the algorithm is allowed to observe the values., meaning their decision whether to select the item presented at time must be based only on the relative ordering (by value) of the first elements that arrived. Algorithms are evaluated according to their probability of correct selection, i.e., the probability of selecting the item of maximum value.

We assume that the set of items is . The order in which items are presented is then represented by a permutation of , where denotes the position of item in the input sequence. Similarly, the ordering of items by value can be represented by a permutation of , where means that the largest item is . Then, the input sequence observed by the algorithm is completely described by the composition .

## 2 Sufficient Properties of Non-Uniform Probability Distributions

In §1, we introduced two properties of non-uniform probability distributions which suffice to ensure existence of a secretary algorithm with constant probability of correct selection. (In other words, the two properties imply s-admissibility.) We begin by formally defining these two properties.

###### Definition 1.

A distribution over permutations of satisfies the -uniform-induced-ordering property, abbreviated -UIOP, if and only if, for every distinct items , if is a random sample from then .

The -uniform-induced-ordering property is a very natural assumption and it is rather easy to show that it is fulfilled by a probability distribution. We will demonstrate this with a few examples in §2.3. However, it is not clear how to analyze algorithms for secretary problems based on this property. To this end, the more technical -block-independence property is more helpful. We show this by analyzing the classic algorithm for the secretary problem in Section 2.1 and the -uniform matroid secretary problem in Section 5. However, one of our main results in Section 2.2 is that these two properties are in fact equivalent, in the limit as the parameters and .

###### Definition 2.

Given a positive integer , partition into consecutive disjoint blocks of size between and each, denoted by . A permutation distribution satisfies the -block-independence property, abbreviated -BIP, if for any distinct , and any

 \rm\bf Pr⎡⎣⋀j∈[p]π(xi)∈Bbi⎤⎦≥(1−δ)(1q)p,

Note that do not necessarily have to be distinct. To simplify notation, given a permutation of , we define a function by setting if and only if for all .

### 2.1 Secretary Algorithms and the (p,q,δ)-block-independence property

Next, we will analyze the standard threshold algorithm for the secretary problem under probability distributions that only fulfill the -block-independence property rather than being uniform. The algorithm only observes the first items. Afterwards, it accepts the first item whose value exceeds all values seen up to this point. Under a uniform distribution, this algorithm picks the best items with probability at least . We show that already for small constant values of and and rather large constant values of this algorithm has constant success probability. At the same time, for large and and small , the probability converges to .

###### Theorem 1.

Under a -block-independent probability distribution, the standard secretary algorithm picks the best item with probability at least .

###### Proof Sketch.

Let denote the index of the block in which the threshold is located. Furthermore, let be the th best item. We condition on the event that comes in block with index . To ensure that our algorithm picks this item, it suffices that comes in blocks . Alternatively, we also pick if the comes in blocks and comes in blocks . Continuing this argument, we get

 \rm\bf Pr[correct selection]≥q∑i=T+1p∑j=2\rm\bf Pr[πB(x1)=i,πB(x2),…,πB(xj−1)>i,πB(xj)

Note that the -BIP implies the -BIP for any , simply by marginalizing over the remaining indices in the tuple. This gives us:

 \rm\bf Pr[correct selection]≥q∑i=T+1p∑j=2(1−δ)1q(q−iq)j−2T−1q,

and the lemma follows after manipulating the expression on the right side and applying some standard bounds. ∎

### 2.2 Relationship Between the Two Properties

We will show that the two properties defined in the preceding section are in some sense equivalent in the limit as the parameters and . (For , a distribution satisfying -UIOP is not even necessarily s-admissible—this is an easy consequence of the lower bound in §3 and the fact that the -UIOP is achieved by a distribution with support size 2, that uniformly randomizes between a single permutation and its reverse. Already for and any constant , the -UIOP implies s-admissibility; this is shown in Appendix A.)

Our first result is relatively straightforward: Any probability distribution that fulfills the -BIP also fulfills the -UIOP. The (easy) proof is deferred to Appendix B.1.2.

###### Theorem 2.

If a distribution over permutation fulfills the -BIP, then it also fulfills the -UIOP.

The other direction is far less obvious. Observe that the -uniform-induced-ordering property works in a purely local sense: even for a single item , the distribution of its position can be far from uniform. For example, the case is even fulfilled by a two-point distribution that only include one permutation and its reverse. Then can only attain two different values. Nevertheless, we have the following result.

###### Theorem 3.

If a distribution over permutation fulfills the -uniform-induced-ordering property, then it also satisfies -block-independence property for , as goes to infinity.

The proof applies the theory of approximation of functions, which addresses the question of how well one can approximate arbitrary functions by polynomials. The main insight underlying the proof is the following. If satisfies the -UIOP, then for any -tuple of distinct elements if one defines random variables , then the expected value of any monomial of total degree in the variables approximates the expected value of that same monomial under the distribution of a uniformly-random permutation. With this lemma in hand, proving Theorem 3 becomes a matter of quantifying how well the indicator function of a (multi-dimensional) rectangle can be approximated by low-degree polynomials. Approximation theory furnishes such estimates readily. To make the proof sketch concrete, we start by some definitions and notations from approximation theory; see, e.g., the textbook by Carothers [10].

###### Definition 3 ([10]).

If is any bounded function over , we define the sequence of Bernstein polynomials for by

 (Bd(f))(x)=d∑k=0f(k/d)(dk)xk(1−x)d−k,    0≤x≤1. (1)
###### Remark 1.

is a polynomial of degree at most .

###### Definition 4 ([10]).

The modulus of continuity of a bounded function over is defined by

 ωf(δ)=sup{|f(x1)−f(x2)|:x1,x2∈[a,b],|x1−x2|≤δ} (2)
###### Remark 2.

Bounded function is continuous over interval if and only if . Moreover, is uniformly continuous if and only if .

We are now ready to state our main ingredient, i.e. Bernstein’s approximation theorem, which shows bounded functions with enough continuity are well approximated by Bernstein polynomials.

###### Theorem 4 ([10]).

For any bounded function over we have

 ∥f−Bd(f)∥∞≤32ωf(1√d) (3)

where for any bounded functions and , .

###### Proof of Theorem 3.

To prove our claim, we start by showing -uniform-induced-ordering property forces the arrival time of items to have almost the same higher-order moments as uniform independent random variables. More precisely, we have the following lemma (the proof is provided in Appendix B.1.2).

###### Lemma 1.

Suppose is drawn from a permutation distribution satisfying the -uniform-induced-ordering property, and is an arbitrary set of items. Let be a uniform random mapping, and random variables for all . Then for every we have .

Given Lemma 1, roughly speaking the key idea for the rest of the proof is looking at probabilities as the expectation of the indicator functions, and then trying to approximate the indicator functions by polynomials. Now, to compute probabilities all we need are moments, which due to Lemma 1 are almost equal to those of uniform independent random variables. Rigorously, we prove the following probabilistic lemma using this idea. (The proof is provided in Appendix B.1.2).

###### Lemma 2.

Let be a uniform random mapping. Furthermore, let be random variables over such that for every we have . Then for any disjoint intervals of where and are multiples of and , we have:

 \rm\bf Pr[p⋀i=1(Xi∈[ai,bi])]≥(p∏i=1(bi−ai))(1−δ)−7pd1/4 (4)

Now, by combining Lemma 1 and Lemma 2, we check the -block-independence property. Start by setting . By Lemma 2 the probability approximation error from what desired will be =. This error goes to zero as if we set . Moreover, we need . So, . As , if we set we are fine. This completes the proof. ∎

### 2.3 Constructions of Probability Distributions Implying the Properties

#### 2.3.1 Randomized One-Dimensional Projections

In this section we present one natural construction leading to a distribution that satisfies the -UIOP. The starting point for the construction is an -tuple of vectors . If one sorts these vectors according to a random one-dimensional projection (i.e., ranks the vectors in increasing order of , for a random drawn from a spherically symmetric distribution), when does the resulting random ordering satisfy the -UIOP? Note that if any of these vectors comprise an orthonormal -tuple and one ranks them in increasing order of , where is drawn from a spherically symmetric distribution, then a trivial symmetry argument shows that the induced ordering of the vectors is uniformly random. Intuitively, then, if the vectors are sufficiently “incoherent”, then any -tuple of them should be nearly orthonormal and their induced ordering when projected onto the 1-dimensional subspace spanned by should be approximately uniformly random. The present section is devoted to making this intuition quantitative. We begin by recalling the definition of the restricted isometry property [9].

###### Definition 5.

A matrix satisfies the restricted isometry property (RIP) of order with restricted isometry constant if the inequalities

 (1−δk)∥x∥2≤∥XTx∥2≤(1+δk)∥x∥2

hold for every submatrix composed of columns of and every vector . Here denotes the Euclidean norm.

Several random matrix distributions are known to give rise to matrices satisfying the RIP with high probability. The simplest such distribution is a random -by- matrix with i.i.d. entries drawn from the normal distribution . It is known [6, 9] that, with high probability, such a matrix satisfies the RIP of order with restricted isometry constant provided that . Even if the columns of are not random, if they are sufficiently “incoherent” unit vectors, meaning that if and otherwise, then satisfies the RIP. Using this idea, we prove the following theorem (with proof provided in Appendix B.1.3).

###### Theorem 5.

Let be the columns of a matrix that satisfies the RIP of order with restricted isometry constant . If is drawn at random from a spherically symmetric distribution and we use to define a permutation of by sorting its elements in order of increasing , the resulting distribution over satisfies the -UIOP.

#### 2.3.2 Constructions with Low Entropy

This subsection presents two constructions showing that there exist permutation distributions with entropy satisfying the -UIOP for arbitrarily large constant and arbitrarily small constant . The proof of the first result is an easy application of the probabilistic method (which is in Appendix B.1.4). The proof of the second result uses Reed-Solomon codes to supply an explicit construction.

###### Theorem 6.

Fix some . If is a random -element multiset of permutations , then the uniform distribution over fulfills the -UIOP with probability at least .

###### Theorem 7.

There is a distribution over permutations that has entropy and fulfills the -uniform-induced-ordering property where .

To derive Theorem 7, we start by proving the following lemma.

###### Lemma 3.

For large enough and some , there is a distribution over functions with entropy such that for any , , we have .

###### Proof.

We will define a function , parameterized by , , and , as a composition of 8 functions, which are mostly injective.

For , let and be a prime power such that (Note that for large enough , we can always find a prime power between and ). Let be drawn independently uniformly from . This is the only randomization involved in the construction. It has entropy .

Let be a Reed-Solomon code of message length and alphabet size . This yields block length and distance . In other words, is a function with and such that for any with , we and differ in at least components.

Furthermore defines one position in each code-word . Given , let , be the projection of a code-word of to its th component, i.e., .

Finally, we observe that . So there is an injective mapping , mapping alphabet symbols of to messages of .

Overall, this defines a function , mapping values of to .

Let .

Now let , . Observe that all functions except for the are injective. Therefore the event can only occur if for some . As is a Reed-Solomon code with distance , and differ in at least components. Therefore, the probability that is at least .

By union bound, the combined probability that this does not hold for one is bounded by

 \rm\bf Pr[3⋀i=1hi(Ci(fi−1(w)))=hi(Ci(fi−1(w′)))]≤3∑i=1(1−diNi)≤3(1−d3N3)≤3K3.

###### Proof of Theorem 7.

By the above lemma, there are constants , , such that the following condition is fulfilled. For some , there is a distribution over functions with entropy such that for any , , we have .

Draw a permutation uniformly at random and define the permutation by using and extending it to a full permutation arbitrarily.

Let be distinct items from . Conditioned on for all , we have with probability . Furthermore, applying a union bound in combination with the above lemma, the probability that there is some pair , with is at most . Therefore, the overall probability that is at least .

The entropy of the distribution that determines is . ∎

## 3 Tight Bound on Entropy of Distribution

One of the consequences of the previous section is the fact that there are s-admissible—in fact, even s-optimal—distributions with entropy . In this section, we show that this bound is actually tight. We show that every probability distribution of entropy is not s-admissible. The crux of the proof lies in defining a notion of “semitone sequences”—sequences which satisfy a property similar to, but weaker than, monotonicity—and showing that an adversary can exploit the existence of long semitone sequences to force every algorithm to have a low probability of success.

###### Theorem 8.

A permutation distribution of entropy cannot be s-admissible.

Here is the proof sketch. We use the fact for distributions of entropy there is a subset of the support of size that is selected with probability at least . It then suffices to show that if the distribution’s support size is at most , then any algorithm’s probability of success against a worst-case adversary is at most . The theorem then follows by setting . To bound the algorithm’s probability of success, we introduce the notion of semitone sequences, defined recursively as follows: an empty sequence is semitone with respect to any permutation , and a sequence is semitone w.r.t.  if and is semitone w.r.t. . We will show that given arbitrary permutations of , there is always a sequence of length that is semitone with respect to all permutations. Later on, we show how an adversary can exploit this sequence to make any algorithm’s success probability small. To make the above arguments concrete, we start by this lemma.

###### Lemma 4.

Suppose , where each is a permutation over . Then there exists a sequence that is semitone with respect to each and .

###### Proof.

For a fixed permutation and a fixed item , we define a function that indicates whether maps is to a higher than or not. Formally,

 hyi(x)={0if πi(x)<πi(y)1if πi(x)>πi(y).

Still keeping one item fixed, we now get a -dimensional vector by concatenating the values for different . This way, we obtain a hash function , where .

Starting from , we now construct a sequence of nested subsets iteratively. At iteration , given set , we do the following. For an arbitrary element of , we hash each element of to a value in by using . Now is defined to be the set of occupants of the most occupied hash bucket in .

Note that if we place at the end of any semitone sequence in it will remain semitone with respect to each . This in turn implies that for any the sequence is semitone with respect to all .

It now remains to bound the length of the sequence we are able to generate. We achieve length if and only if is the first empty set. At iteration of the above construction, we have elements to hash and we have hash buckets, so and therefore . As , this implies . So . ∎

We now turn to showing that an adversary can exploit a semitone sequence and force any algorithm to only have probability of success. To show this we look at the performance of the best deterministic algorithm against a particular distribution over assignment of values to items.

###### Lemma 5.

Let . Assign values from to items at random by

 value(xs)={max(V)with probability 1/smin(V)with probability 1−1/s

and then assigning values from to items recursively. Assign a value to all other items.

Consider an arbitrary algorithm following permutation such that is semitone with respect to . This algorithm selects the best item with probability at most .

###### Proof.

Fixing some (deterministic) algorithm and permutation , let be the event that the algorithm selects any item among and let be the event that the algorithm selects the best item among . We will show by induction that . This will imply .

For this statement trivially holds. Therefore, let us consider some . By induction hypothesis, we have . As is semitone with respect to , either comes before or after all . We distinguish these two cases.

Case 1: comes before all . The algorithm can decide to accept (without seeing the items ). In this case, we have for sure. We only have if gets a higher value than . By definition this happens with probability . So, we have . The algorithm can also decide to reject . Then if and only if . Furthermore, if and only if and does not get the highest value among . These events are independent, so . Applying the induction hypothesis, we get .

Case 2: comes after all . When the algorithm comes to , it may or may not have selected an item so far. If it has already selected an item (), then this element is the best among with probability by induction hypothesis. Independent of these events, is worse than the best items among with probability . Therefore, we get . It remains the case that the algorithm selects item (). This item is the better than with probability . That is, . In combination, we have . ∎

Now, to show Theorem 8, we first give a bound in terms of the support size of the distribution. In fact. Lemmas 4 and 5 with Yao’s principle then imply that any algorithm’s probability of success against a worst-case adversary is at most (details of the proof are in Appendix B.2). Later on, we will show how this transfers to a bound on the entropy.

###### Lemma 6.

If is chosen from a distribution of support size at most , then any algorithm’s probability of success against a worst-case adversary is at most .

To get a bound on the entropy, we show that for a low-entropy distribution there is a small subset of the support that is selected with high probability. More precisely, we have the following technical lemma whose proof can be found in Appendix B.2.

###### Lemma 7.

Let be drawn from a finite set by a distribution of entropy . Then for any there is a set , , such that .

Finally, Theorem 8 is proven as a combination of Lemma 6 and Lemma 7.

###### Proof of Theorem 8.

Set . Lemma 7 shows that there is a set of permutations of size at least that is chosen with probability at least . The distribution conditioned on being in has support size only . Lemma 6 shows that if is chosen by a distribution of support size , then the probability of success of any algorithm against a worst-case adversary is at most . Therefore, we get

 \rm\bf Pr[success] =\rm\bf Pr[π∈Π]\rm\bf Pr[success∣∣π∈Π]+\rm\bf Pr[π∉Π]\rm\bf Pr[success∣∣π∉Π] ≤\rm\bf Pr[success∣∣π∈Π]+\rm\bf Pr[π∉Π] ≤k+1logn+8Hlog(k−3) =o(1).

## 4 Easy Distributions Are Hard to Characterize

Which distributions are s-admissible, meaning that they allow an algorithm to achieve constant probability of correct selection in the secretary problem? The results in §2 and §3 inspire hope that the -UIOP, the -BIP, or something very similar, is both necessary and sufficient for s-admissibility. Unfortunately, in this section we show that in some sense, it is hopeless to try formulating a comprehensible condition that is both necessary and sufficient. We construct a family of distributions with associated algorithms alg having constant success probability when the items are randomly ordered according to , but the complicated and unnatural structure of the distribution and algorithm underscore the pointlessness of precisely characterizing s-admissible distributions. In more objective terms, we construct a which is s-admissible, yet for any algorithm whose stopping rule is computable by circuits of size less than , the probability of correct selection is .

Throughout this section (and its corresponding appendix) we will summarize the adversary’s assignment of values to items by a permutation ; the largest value is assigned to item . If is any probability distribution over such permutations, we will let denote the probability that alg makes a correct selection when the adversary samples the value-to-item assignment from , and nature independently samples the item-to-time-slot assignment from . We will also let

 Vπ––(∗,σ––) =max\sc algVπ––(\sc alg,σ––) Vπ––(\sc alg,∗) =minσ––Vπ––(\sc alg% ,σ––) Vπ–– =minσ––max\sc algVπ––(\sc alg,σ––).

Thus, for example, the property that is s-admissible is expressed by the formula .

As a preview of the techniques underlying our construction, it is instructive to first consider a game against nature in which there is no adversary, and the algorithm is simply trying to pick out the maximum element when items numbered in order of decreasing value arrive in the random order specified by . This amounts to determining , where is the distribution that assigns probability 1 to the identity permutation. Our construction is based on the following intuition. In the secretary problem with uniformly random arrival order, the arrival order of items that arrived before time is uncorrelated with the order in which items arrive after time , and so the ordering of past elements is irrelevant to the question of whether to stop at time . However, there is a great deal of entropy in the ordering of elements that arrived before time ; it encodes bits of information. We will construct a distribution in which this information contained in the ordering of the elements that arrived before time fully encodes the time when the maximum element will arrive after time , but in an “encrypted” way that cannot be decoded by polynomial-sized circuits. We will make use of the well-known fact that a random function is hard on average for circuits of subexponential size.

###### Lemma 8.

If is a random function, then with high probability there is no circuit of size that outputs the function value correctly on more than fraction of inputs.

The simple proof of Lemma 8 is included in the appendix, for reference.

###### Theorem 9.

There exists a family of distributions such that , but for any algorithm alg whose stopping rule can be computed by circuits of size , we have .

###### Proof.

Assume for convenience that is divisible by 4. Fix a function such that no circuit of size outputs the value of correctly on more than fraction of inputs. The existence of such functions is ensured by Lemma 8. We use to define a permutation distribution as follows. For any binary string , define a permutation by performing the following sequence of operations. First, rearrange the items in order of increasing value by mapping item to position for each . Next, for , swap the items in positions and if and only if . Finally, swap the items in positions and . (Note that this places the maximum-value item in position .) The permutation distribution is the uniform distribution over .

It is easy to design an algorithm which always selects the item of maximum value when the input sequence is sampled from . The algorithm first decodes the unique binary string such that , by comparing the items arriving at times and for each and setting the bit according to the outcome of this comparison. Having decoded , we then compute and select the item that arrives at time . By construction, when is drawn from this is always the element of maximum value.

Finally, if alg is any secretary algorithm we can attempt use alg to guess the value of for any input by the following simulation procedure. First, define a permutation by performing the same sequence of operations as in except for the final step of swapping the items in positions and ; note that this means that , unlike , can be constructed from input by a circuit of polynomial size. Now simulate alg on the input sequence , observe the time when it selects an item, and output . The circuit complexity of this simulation procedure is at most times the circuit complexity of the stopping rule implemented by alg, and the fraction of inputs on which it guesses correctly is precisely . (To verify this last statement, note that alg makes its selection at time when observing input sequence if and only if if also makes its selection at time when observing input sequence , because the two input sequences are indistinguishable to comparison-based algorithms at that time.) Hence, if then the stopping rule of alg cannot be implemented by circuits of size . ∎

Our main theorem in this section derives essentially the same result for the standard game-against-adversary interpretation of the secretary problem, rather than the game-against-nature interpretation adopted in Theorem 9.

###### Theorem 10.

For any function such that while , there exists a family of distributions such that , but any algorithm alg whose stopping rule can be computed by circuits of size satisfies .

The full proof is provided in Appendix B.3. Here we sketch the main ideas.

###### Proof sketch..

As in Theorem 9, the algorithm and “nature” (i.e., the process sampling the input order) will work in concert with each other to bring about a correct selection, using a form of coordination that is information-theoretically easy but computationally hard. The difficulty lies in the fact that the adversary is simultaneously working to thwart their efforts. If nature, for example, wishes to use the first half of the input sequence to “encrypt” the position where item 1 will be located in the second half of the sequence, then the adversary is free to assign the maximum value to item 2 and a random value to item 1, rendering the encrypted information useless to the algorithm.

Thus, our construction of the permutation distribution and algorithm alg will be guided by two goals. First, we must “tie the adversary’s hands” by ensuring that alg has constant probability of correct selection unless the adversary’s permutation, , is in some sense “close” to the identity permutation. Second, we must ensure that alg has constant probability of correct selection whenever is close to the identity, not only when it is equal to the identity as in Theorem 9. To accomplish the second goal we modify the construction in Theorem 9 so that the first half of the input sequence encodes the binary string using an error-correcting code. To accomplish the first goal we define to be a convex combination of two distributions: the “encrypting” distribution described earlier, and an “adversary-coercing” distribution designed to make it easy for the algorithm to select the maximum-value element unless the adversary’s permutation is close to the identity in an appropriate sense. ∎

## 5 Extensions Beyond Classic Secretary Problem

We look at two generalizations of the classic secretary problem in this section, namely the multiple-choice secretary problem, studied in [25], and the online weighted bipartite matching problem, studied extensively in [26, 23], under our non-uniform permutation distributions. We give a positive result showing that a natural variant of the algorithm in [25] achieves a -competitive ratio under our pseudo-random properties defined in §2, while for the latter we show the algorithm proposed by [26] fails to achieve any constant competitive ratio under our pseudo-random properties.

##### Multiple-choice secretary problem

We consider multiple-choice secretary problem (a.k.a. -uniform matroid secretary problem). In this setting not only a single secretary has to be selected but up to . An algorithm observes items with non-negative values based on the ordering and chooses at most items in an online fashion. The goal is to maximize the sum of values of selected items. We consider distributions over permutations that fulfill the -BIP, for some . We show that a slight adaptation of the algorithm in [25] achieves competitive ratio , for large enough values of and and small enough .

The algorithm is defined recursively. We denote by the call of the algorithm that operates on the prefix of length of the input. It is allowed to choose items and expects number of blocks. For , is simply the standard secretary algorithm that we analyzed in Section 2.1. For , the algorithm first draws a random number from a binomial distribution and then executes