Parallel Algorithms for Select and Partition with Noisy Comparisons
We consider the problem of finding the highest element in a totally ordered set of elements (Select), and partitioning a totally ordered set into the top and bottom elements (Partition) using pairwise comparisons. Motivated by settings like peer grading or crowdsourcing, where multiple rounds of interaction are costly and queried comparisons may be inconsistent with the ground truth, we evaluate algorithms based both on their total runtime and the number of interactive rounds in three comparison models: noiseless (where the comparisons are correct), erasure (where comparisons are erased with probability ), and noisy (where comparisons are correct with probability and incorrect otherwise). We provide numerous matching upper and lower bounds in all three models. Even our results in the noiseless model, which is quite well-studied in the TCS literature on parallel algorithms, are novel.
Rank aggregation is a fundamental problem with numerous important applications, ranging from well-studied settings such as social choice [CN91] and web search [DKNS01] to newer platforms such as crowdsourcing [CBCTH13] and peer grading [PHC13]. Salient common features among these applications is that in the end, ordinal rather than cardinal information about the elements is relevant, and a precise fine-grained ordering of the elements is often unnecessary. For example, the goal of social choice is to select the best alternative, regardless of how good it is. In a curved course, the goal of peer grading is to partition assignments into quantiles corresponding to A/B/C/D, etc, regardless of their absolute quality.
Prior work has produced numerous ordinal aggregation procedures (i.e. based on comparisons of elements rather than cardinal evaluations of individual elements) in different settings, and we overview those most relevant to our work in Section 1.1. However, existing models from this literature fail to capture an important aspect of the problem with respect to some of the newer applications; that multiple rounds of interaction are costly. In crowdsourcing, for instance, one round of interaction is the time it takes to send out a bunch of tasks to users and wait for their responses before deciding which tasks to send out next, which is the main computational bottleneck. In peer grading, each round of interaction might take a week, and grades are expected to be determined certainly within a few weeks. In conference decisions, even one round of interaction seems to be pushing the time constraints.
Fortunately, the TCS community already provides a vast literature of algorithms with this constraint in mind, under the name of parallel algorithms. For instance, previous work resolves questions like “how many interactive rounds are necessary for a deterministic or randomized algorithm to select the element with total comparisons?” [Val75, Rei81, AKSS86, AA88a, AA88b, BB90]. This line of research, however, misses a different important aspect related to these applications (that is, in fact, captured by most works in rank aggregation), that the comparisons might be erroneous. Motivated by applications such as crowdsourcing and peer grading, we therefore study the round complexity of Partition, the problem of partitioning a totally ordered set into the top and bottom elements, when comparisons might be erroneous.
Our first results on this front provide matching upper and lower bounds on what is achievable for Partition in just one round in three different models of error: noiseless (where the comparisons are correct), erasure (where comparisons are erased with probability ), and noisy (where comparisons are correct with probability and incorrect otherwise). We provide one-round algorithms using comparisons that make and mistakes (a mistake is any element placed on the wrong side of the partition) with high probability in the three models, respectively. The algorithms are randomized and different for each model, and the bounds hold both when is an absolute constant or a function of and . We provide asymptotically matching lower bounds as well: all (potentially randomized) one-round algorithms using comparisons necessarily make , and mistakes in expectation in the three models, respectively. We further show that the same algorithms and lower bound constructions are also optimal (up to absolute constant factors) if mistakes are instead weighted by various different measures of their distance to , the cutoff.
After understanding completely the tradeoff between the number of comparisons and mistakes for one-round algorithms in each of the three models, we turn our attention to multi-round algorithms. Here, the results are more complex and can’t be summarized in a few sentences. We briefly overview our multi-round results in each of the three models below. Again, all of the upper and lower bounds discussed below extend when mistakes are weighted by their distance to the cutoff. We overview the techniques used in proving our results in Section 1.2, but just briefly note here that the level of technicality roughly increases as we go from the noiseless to erasure to noisy models. In particular, lower bounds in the noisy model are quite involved.
Multi-Round Results in the Noiseless Model.
We design a 2-round algorithm for Partition using total comparisons that makes mistakes with probability , and prove a nearly matching lower bound of mistakes, for any ( may be a constant or a function of ).
We design a 3-round algorithm for Partition making total comparisons that makes zero mistakes with probability . It is known that total comparisons are necessary for a 3-round algorithm just to solve Select, the problem of finding the element, with probability [BB90].
We design a 4-round algorithm for Partition making total comparisons that makes zero mistakes with probability . This matches the guarantee provided by an algorithm of Bollobás and Brightwell for Select, but is significantly simpler (in particular, it avoids any graph theory) [BB90].
Multi-Round Results in the Erasure Model.
We design a -round algorithm for Partition making total comparisons that makes zero mistakes with probability .
We show that no -round algorithm even for Select making total comparisons can succeed with probability .
Multi-Round Results in the Noisy Model.
We design a 4-round algorithm for Partition making comparisons that makes zero mistakes with high probability (a trivial corollary of our noiseless algorithm).
We show that no algorithm even for Select making comparisons can succeed with probability (in any number of rounds).
We design an algorithm for findMin (the special case of Select with ) making comparisons that succeeds with probability . We also show that no algorithm making comparisons can solve findMin with probability (in any number of rounds).
Together, these results tell an interesting story. In one round, one can obtain the same guarantee in the noiseless versus erasure model with an additional factor of comparisons. And one can obtain the same guarantee in the erasure versus noisy model with an additional factor of comparisons. In some sense, this should be expected, because this exactly captures the degradation in information provided by a single comparison in each of the three models (a noiseless comparison provides one bit of information, an erasure comparison provides bits of information, and a noisy comparison provides bits of information). But in multiple rounds, everything changes. In four rounds, one can perfectly partition with high probability and total comparisons in the noiseless model. In the erasure model, one can indeed partition perfectly with high probability and comparisons, but now it requires rounds instead of just . Moreover, in the noisy model, any algorithm even solving Select with probability requires an blow-up in the number of comparisons, in any number of rounds! Note that neither of these additional factors come from the desire to succeed with high probability (as the lower bounds hold against even a success) nor the desire to partition every element correctly (as the lower bounds hold even for just Select), but just from the way in which interaction helps in the three different models.
While we believe that the story told by our work as a whole provides the “main result,” it is also worth emphasizing independently our results in the noisy model. Our one-round algorithm, for instance, is more involved than its counterparts in the noiseless and erasure models and our analysis uses the theory of biased random walks. Our multi-round lower bounds against Select and findMin in the noisy model are the most technical results of the paper, and tell their own interesting story about the difference between findMin and Select in the noisy model. To our knowledge, most tight lower bounds known for Select come directly from lower bounding findMin. It’s surprising that findMin requires fewer comparisons than Select to solve with probability in the noisy model.
We proceed now by discussing some related works below, and briefly overviewing our techniques in Section 1.2. We provide some conclusions and future directions in Section 1.3. Our single-round results are discussed in Section 3 and our multi-round results are discussed in Section 4. However, due to space constraints, all proofs are deferred to the appendix.
1.1 Related Work
Rank aggregation is an enormous field that we can’t possibly summarize in its entirety here. Some of the works most related to ours also study Partition (sometimes called Top-K). Almost all of these works also consider the possibility of erroneous comparisons, although sometimes under different models where the likelihood of an erroneous comparison scales with the distance between the two compared elements [CS15, BSC13, Eri13]. More importantly, to our knowledge this line of work either considers settings where the comparisons are exogenous (the designer has no control over which comparisons are queried, she can just analyze the results), or only analyze the query complexity and not the round complexity of designed algorithms. Our results contribute to this line of work by providing algorithms designed for settings like crowdsourcing or peer grading where the designer does have design freedom, but may be constrained by the number of interactive rounds.
There is a vast literature from the parallel algorithms community studying various sorting and selection problems in the noiseless model. For instance, tight bounds are known on the round complexity of Select for deterministic algorithms using total comparisons (it is ) [Val75, AKSS86], and randomized algorithms using total comparisons (it is ) [AA88b, AA88a, Rei81, BB90]. Similar results are known for sorting and approximate sorting as well [Col88, AAV86, AKS83, HH81, BT83, BH85, Lei84]. Many of the designed deterministic algorithms provide sorting networks. A sorting network on elements is a circuit whose gates are binary comparators. The depth of a sorting network is the number of required rounds, and the number of gates is the total number of comparisons. Randomized algorithms are known to require fewer rounds than deterministic ones with the same number of total comparisons for both sorting and selecting [AA88a, BB90].
In the noisy model, one can of course take any noiseless algorithm and repeat every comparison times in parallel. To our knowledge, positive results that avoid this simple repetition are virtually non-existent. This is likely because a lower bound of Leighton and Ma [LM00] proves that in fact no sorting network can provide an asymptotic improvement (for complete sorting), and our lower bound (Theorem 11) shows that no randomized algorithm can provide an asymptotic improvement for Select. To our knowledge, no prior work studies parallel sorting algorithms in the erasure model. On this front, our work contributes by addressing some open problems in the parallel algorithms literature, but more importantly by providing the first parallel algorithms and lower bounds for Select in the erasure and noisy models.
There is also an active study of sorting in the noisy model [BM08, BM09, MMV13] within the TCS community without concern for parallelization, but with concern for resampling. An algorithm is said to resample if it makes the same comparison multiple times. Clearly, an algorithm that doesn’t resample can’t possibly find the median exactly in the noisy model (what if the comparison between and is corrupted?). The focus of these works is designing poly-time algorithms to find the maximum-likelihood ordering from a set of noisy comparisons. Our work is fundamentally different from these, as we have asymptotically fewer than comparisons to work with, and at no point do we try to find a maximum-likelihood ordering (because we only want to solve Partition).
1.2 Tools and Techniques
Single Round Algorithms and Lower Bounds. Our single round results are guided by the following surprisingly useful observation: in order for an algorithm to possibly know that exceeds the highest element, must at least be compared to some element between itself and (as otherwise, the comparison results would be identical if we replaced with an element just below ). Unsurprisingly, it is difficult to guarantee that many elements within of are compared to elements between themselves and using only total comparisons in a single round, and this forms the basis for our lower bounds. Our upper bounds make use of this observation as well, and basically are able to guarantee that an element is correctly placed with high probability whenever it is compared to an element between itself and . It’s interesting that the same intuition is key to both the upper and lower bounds. We provide a description of the algorithms and proofs in Section 3.
In the erasure model, the same intuition extends, except that in order to have a non-erased comparison between and an element between and , we need to make roughly such comparisons. This causes our lower bounds to improve by a factor of . In the noisy model, the same intuition again extends, although this time the right language is that we need to learn bits of information from comparisons of to elements between and , which requires such comparisons, and causes the improved factor of in our lower bounds. Our algorithms in these two models are similar to the noiseless algorithm, but the analysis becomes necessarily more involved. For instance, our analysis in the noisy model appeals to facts about biased random walks on the line.
Multi-Round Algorithms and Lower Bounds. Our constant-round algorithms in the noiseless model are based on the following intuition: once we reach the point that we are only uncertain about elements, we are basically looking at a fresh instance of Partition on a significantly smaller input size, except we’re still allowed comparisons per round. Once we’re only uncertain about only elements, one additional round suffices to finish up (by comparing each element to every other one). The challenge in obtaining a four-round algorithm (as opposed to just an -round algorithm) is ensuring that we make significant enough gains in the first three rounds.
Interestingly, these ideas for constant-round algorithms in the noiseless model don’t prove useful in the erasure or noisy models. Essentially the issue is that even after a constant number of rounds, we are unlikely to be confident that many elements are above or below , so we can’t simply recurse on a smaller instance. Still, it is quite difficult to discover a formal barrier, so our multi-round lower bounds for the erasure and noisy models are quite involved. We refer the reader to Section 4 for further details.
We study the problems of Partition and Select in settings where interaction is costly in the noiseless, erasure, and noisy comparison models. We provide matching (up to absolute constant factors) upper and lower bounds for one round algorithms in all three models, which also show that the number of comparisons required for the same guarantee degrade proportional to the information provided by a single comparison. We also provide matching upper and lower bounds for multi-round algorithms in all three models, which also show that the round and query complexity required for the same guarantee in these settings degrades worse than just by the loss in information when moving between the three comparison models. Finally, we show a separation between findMin and Select in the noisy model.
We believe our work motivates two important directions for future work. First, our work considers some of the more important constraints imposed on rank aggregation algorithms in applications like crowdsourcing or peer grading, but not all. For instance, some settings might require that every submission receives the same amount of attention (i.e. is a member of the same number of comparisons), or might motivate a different model of error (perhaps where mistakes aren’t independent or identical across comparisons). It would be interesting to design algorithms and prove lower bounds under additional restrictions motivated by applications.
Finally, it is important to consider incentives in these applications. In peer grading, for instance, the students themselves are the ones providing the comparisons. An improperly designed algorithm might provide “mechanism design-type” incentives for the students to actively misreport if they think it will boost their own grade. Additionally, there are also “scoring rule-type” incentives that come into play: grading assignments takes effort! Without proper incentives, students may choose to put zero or little effort into their grading and just provide random information. We believe that using ordinal instead of cardinal information will be especially helpful on this front, as it is much easier to design mechanisms when players just make binary decisions, and it’s much easier to understand how the noisy information provided by students scale with effort (in our models, it is simply that will increase with effort). It is therefore important to design mechanisms for applications like peer grading by building off of our algorithms.
2 Preliminaries and Notation
In this work, we study two problems, Select and Partition. Both problems take as input a randomly sorted, totally ordered set and an integer . For simplicity of notation, we denote the smallest element of the set as . So if the input set is of size , the input is exactly . In Select, the goal is to output the (location of the) element . In Partition, the goal is to partition the elements into the top , which we’ll call for Accept and the bottom , which we’ll call for Reject. Also for ease of notation, we’ll state all of our results for , the median, w.l.o.g.
We say an algorithm solves Select if it outputs the median, and solves Partition if it places correctly all elements above and below the median. For Select, we will say that an algorithm is a -approximation with probability if it outputs an element in with probability at least . For Partition, we will consider a class of success measures, parameterized by a constant , and say the -weighted error associated with a specific partitioning into is equal to .
Query and Round Complexity. Our algorithms will be comparison-based. We study both the number of queries, and the number of adaptive rounds necessary to achieve a certain guarantee.
Notation. We always consider settings where the input elements are a priori indistinguishable, or alternatively, that our algorithms randomly permute the input before making comparisons. When we write , we mean literally that in the ground truth. In the noisy model, the results of comparisons may disagree with the underlying ordering, so we say that beats if a noisy comparison of and returned as larger than (regardless of whether or not ).
Models of Noise. We consider three comparison models, which return the following when .
Noiseless: Returns beats .
Erasure: Returns beats with probability , and with probability .
Noisy: Returns beats with probability , and beats with probability .
Partition versus Select. We design all of our algorithms for Partition, and prove all of our lower bounds against Select. We do this because Select is in some sense a strictly easier problem than Partition. We discuss how one can get algorithms for Select via algorithms for Partition and vice versa formally in Appendix A.
Resampling. Finally, note that in the erasure and noisy models, it may be desireable to query the same comparison multiple times. This is called resampling. It is easy to see that without resampling, it is impossible to guarantee that the exact median is found with high probability, even when all comparisons are made (what if the comparison between and is corrupted?). Resampling is not necessarily undesireable in the applications that motivate this work, so we consider our main results to be in the model where resampling is allowed. Still, it turns out that all of our algorithms can be easily modified to avoid resampling at the (necessary) cost of a small additional error, and it is easy to see the required modifications.
3 Results for Non-Adaptive Algorithms
In this section, we provide our results on non-adaptive (round complexity = 1) algorithms. We begin with the upper bounds below, followed by our matching (up to constant factors) lower bounds.
3.1 Upper Bounds
We provide asymptotically optimal algorithms in each of the three comparison models. Our three algorithms actually choose the same comparisons to make, but determine whether or not to accept or reject an element based on the resulting comparisons differently. The algorithms pick a skeleton set of size and compare every element in to every other element. Each element not in is compared to random elements of . Pseudocode for this procedure is given in Appendix B.
From here, the remaining task in all three models is similar: the algorithm must first estimate the rank of each element in the skeleton set. Then, for each , it must use this information combined with the results of comparisons to guess whether should be accepted or rejected. The correct approach differs in the three models, which we discuss next.
Noiseless Model. Pseudocode for our algorithm in the noiseless model is provided as Algorithm 2 in Appendix B. First, we estimate that the median of the skeleton set, , is close to the actual median. Then, we hope that each is compared to some element in between itself and . If this happens, we can pretty confidently accept or reject . If it doesn’t, then all we learn is that is beaten by some elements above and it beats some elements below , which provides no helpful information about whether is above or below the median, so we just make a random decision.
Algorithm 2 has query complexity , round complexity , does not resample, and outputs a partition that, for all , has:
expected -weighted error , for any
-weighted error with probability , for any .
We provide a complete proof of Theorem 1 in Appendix B. The main ideas are the following. There are two sources of potential error in Algorithm 2. First, maybe the skeleton set is poorly chosen and not representative of the ground set. But this is extremely unlikely with such a large skeleton set. Second, note that if is compared to any element in between itself and , and is very close to , then will be correctly placed. If , then we’re unlikely to miss this window on independent tries, and will be correctly placed.
Erasure Model. In the erasure model, pseudocode for the complete algorithm we use is Algorithm 3 in Appendix B. At a high level, the algorithm is similar to Algorithm 2 for the noiseless model, so we refer the reader to Appendix B to see the necessary changes.
Algorithm 3 has query complexity , round complexity , does not resample, and outputs a partition that, for all , has:
expected -weighted error , for any such that
-weighted error with probability , whenever and .
We again postpone a complete proof of Theorem 2 to Appendix B. The additional ingredient beyond the noiseless case is a proof that with high probability, not too many of the comparisons within are erased and therefore while we can’t learn the median of exactly, we can learn a set of almost elements that are certainly above the median, and almost elements that are certainly below. If beats an element that is certainly above the median of , we can confidently accept it, just like in the noiseless case.
Noisy Model. Pseudocode for our algorithm in the noisy model is provided as Algorithm 4 in Appendix B. Algorithm 4 is necessarily more involved than the previous two. We can still recover a good ranking of the elements in the skeleton set using the Braverman-Mossel algorithm [BM08], so this isn’t the issue. The big difference between the noisy model and the previous two is that no single comparison can guarantee that should be accepted or rejected. Instead, every time we have a set of elements all above the median of , , of which beats at least half, this provides some evidence that should be accepted. Every time we have a set of elements all below of which is beaten by at least half, this provides some evidence that should be rejected. The trick is now just deciding which evidence is stronger. Due to space constraints, we refer the reader to Algorithm 4 to see our algorithm, which we analyze using theory from biased random walks on the line.
Algorithm 4 has query complexity , round complexity , does not resample, and outputs a partition that, for all , has:
expected -weighted error , for any , .
-weighted error with probability , for any , .
3.2 Lower Bounds
In this section, we show that the algorithms designed in the previous section are optimal up to constant factors. All of the algorithms in the previous section are “tight,” in the sense that we expect element to be correctly placed whenever it is compared to enough elements between itself and the median. In the noiseless model, one element is enough. In the erasure model, we instead need (to make sure at least one isn’t erased). In the noisy model, we need (to make sure we get bits of information about the difference between and the median). If we don’t have enough comparisons between and elements between itself and the median, we shouldn’t hope to be able to classify correctly, as the comparisons involving would look nearly identical if we replaced with an element just on the other side of the median. Our lower bounds capture this intuition formally, and are all proved in Appendix B.
For all , , any non-adaptive algorithm with query complexity necessarily has expected -weighted error in the noiseless model, in the erasure model, and in the noisy model.
4 Results for Multi-Round Algorithms
4.1 Noiseless Model
We first present our algorithm and nearly matching lower bound for 2-round algorithms. The first round of our algorithm tries to get as good of an approximation to the median as possible, and then compares it to every element in round two. Getting the best possible approximation is actually a bit tricky. For instance, simply finding the median of a skeleton set of size only guarantees an element within of the median.
For any , our algorithm starts with a huge skeleton set of random samples from . This is too large to compare every element in with itself, so we choose a set of random pivots. Then we compare every element in to every element in , and we will certainly learn two pivots, and such that the median of lies in , and a such that the median of is exactly the element of . Now, we recurse within and try to find the element. Of course, because all of these comparisons happen in one round, we don’t know ahead of time in which subinterval of we’ll want to recurse, so we have to waste a bunch of comparisons. These continual refinements still make some progress, and allow us to find a smaller and smaller window containing the median of , which is a very good approximation to the true median because was so large. Pseudocode for our algorithm is Algorithm 5 in Appendix C, which “recursively” tries to find the element of .
For all and , Algorithm 5 has round complexity , query complexity , and outputs a partition that:
has expected -weighted error at most
has -weighted error at most with probability at least .
Note that setting , and such that , we get an algorithm with round complexity , query complexity that outputs a partition with -weighted error with probability .
We also prove a nearly matching lower bound on two-round algorithms in the noiseless model. At a very high level, our lower bound repeats the argument of our one round lower bound twice. Specifically, we show that after one round, there are many elements within a window of size of the median such that a constant fraction of these elements have not been compared to any other elements in this window. We then show that after the second round, conditioned on this, there is necessarily a window of size such that a constant fraction of these elements have not been compared to any other elements in this window. Finally we show that this implies that we must err on a constant fraction of these elements. The actual proof is technical, but follows this high level outline. Proofs of Theorems 5 and 6 can be found in Appendix C.
For all , and any , any algorithm with query complexity and round complexity necessarily has expected -weighted error .
From here we show how to make use of our two-round algorithm to design a three-round algorithm that makes zero mistakes with high probability. After our two-round algorithm with appropriate parameters, we can be pretty sure that the median lies somewhere in a range of , so we can just compare all of these elements to each other in one additional round. Pseudocode for Algorithm 6 is in Appendix C.
For all , Algorithm 6 has query complexity , round complexity , and outputs a partition with zero -weighted error with probability .
Again, recall that queries are necessary for any three-round algorithm just to solve Select with probability [BB90]. Finally, we further make use of ideas from our two-round algorithm to design a simple four round algorithm that has query complexity and makes zero mistakes with high probability. More specifically, we appropriately tune the parameters for our two-round algorithm (i.e. set ) to find a window of size that contains the median (and already correctly partition all other elements). We then use similar ideas in round three to further find a window of size that contains the median (and again correctly partition all other elements). We use the final round to compare all remaining uncertain elements to each other and correctly partition them.
For all , and any , Algorithm 7 has query complexity , round complexity , and outputs a partition with zero -weighted error with probability at least .
4.2 Erasure and Noisy Models
Here we briefly overview our results on multi-round algorithms in the erasure and noisy models. We begin with an easy reduction from these models to the noiseless model, at the cost of a blow-up in the round or query complexity. Essentially, we are just observing that one can adaptively resample any comparison in the erasure model until it isn’t erased (which will take resamples in expectation), and also that one can resample in parallel any comparison in either the erasure or noisy model the appropriate number of times and have it effectively be a noiseless comparison.
If there is an algorithm solving Partition, Select or findMin in the noiseless model with probability that has query complexity and round complexity , then there are also algorithms that resample that:
solve Partition, Select or findMin in the erasure model with probability that have expected query complexity , but perhaps with expected round complexity as well.
solve Partition, Select or findMin in the erasure model with probability that have query complexity , and round complexity .
solve Partition, Select or findMin in the noisy model with probability that have query complexity , and round complexity .
There are algorithms that resample that:
solve Partition or Select in the erasure model with probability with query complexity and round complexity .
solve Partition or Select in the noisy model with probability with query complexity and round complexity .
In the erasure model, the algorithms provided by this reduction do not have the optimal round/query complexity. We show that queries are necessary and sufficient, as well as rounds. For the algorithm, we begin by finding the median of a random set of size elements. This can be done in rounds and total comparisons by Corollary 1. Doing this twice in parallel, we find two elements that are guaranteed to be above/below the median, but very close. Then, we spend rounds comparing every element to both of these. It’s not obvious that this can be done in rounds. Essentially what happens is that after each round, a fraction of elements are successfully compared, and we don’t need to use any future comparisons on them. This lets us do even more comparisons involving the remaining elements in future rounds, so the fraction of successes actually increases with successive rounds. Analysis shows that the number of required rounds is therefore (instead of if the fraction was constant throughout all rounds). After this, we learn for sure that the median lies within a sublinear window, and we can again invoke the 4-round algorithm of Corollary 1 to finish up. Our lower bound essentially shows that it takes rounds just to have a non-erased comparison involving all elements even with per round, and that this implies a lower bound. Pseudocode for the algorithm and proofs of both theorems are in Appendix D.
With probability at least , Algorithm 9 has query complexity , round complexity , and solves Partition.
Assume . In the erasure model, any algorithm solving Select with probability even with comparisons per round necessarily has round complexity .
We now introduce a related problem that is strictly easier than Partition or Select, which we call Rank, and prove lower bounds on the round/query complexity of Rank noisy models, which will imply lower bounds on Partition and Select. In Rank, we are given as input a set of elements, and a special element and asked to determine ’s rank in (i.e. how many elements in are less than ).
At a high level, we show (in the proof of Theorem 11) that with only queries, it’s very likely that there are a constant fraction of ’s such that the algorithm is can’t be very sure about the relation between and . This might happen, for instance, if not many comparisons were done between and and they were split close to 50-50. From here, we use an anti-concentration inequality (the Berry-Essen inequality) to show that the rank of does not concentrate within some range of size conditioned on the available information. In otherwords, the information available simply cannot narrow down the rank of to within a small window with decent probability, no matter how that information is used. We then conclude that no algorithms with comparisons can approximate the rank well with probability .
In the noisy model, any algorithm obtaining an -approximation for Rank with probability necessarily has query complexity .
Finally, we conclude with an algorithm for findMin in the noisy model showing that findMin is strictly easier than Select. This is surprising, as most existing lower bounds against Select are obtained by bounding findMin. Our algorithm again begins by finding the minimum, , of a random set of size using total comparisons by Corollary 1. Then, we iteratively compare each element to a fixed number of times, throwing out elements that beat it too many times. Again, as we throw out elements, we get to compare the remaining elements to more and more. We’re able to show that after only an appropriate number of iterations (so that only total comparisons have been made), it’s very likely that only elements remain, and that with constant probability the true minimum was not eliminated. From here, we can again invoke the algorithm of Corollary 1 to find the true minimum (assuming it wasn’t eliminated).
Assume is large enough and . Algorithm 10 has query complexity and solves findMin in the noisy model with probability at least .
Assume , is large enough and . Any algorithm in the noisy model with query complexity solves findMin with probability at most .
Theorem 12 shows that findMin is strictly easier than Select (as it can be solved with constant probability with asymptotically fewer comparisons). Theorem 13 is included for completeness, and shows that it is not possible to get a better success probability without a blow-up in the query complexity. The proof of Theorem 13 is similar to that of Theorem 11.
Appendix A Technical Lemmas
Before beginning our proofs, we provide a few technical lemmas that will be used throughout, related to geometric sums, biased random walks, etc.
We first show the following reductions to prove that is the most difficult choice of in Select and Partition. We also show the reductions between Select and Partition.
The following relations hold:
Suppose can solve Select / Partition in the case of elements for any , but only for . Then can be used to solve Select / Partition for any with the same success probability.
Suppose can solve Select on elements. We can construct algorithm based on to solve Partition of elements with one more round and extra comparisons in the noiseless model, in the erasure model, and in the noisy model. The success probability decreases by except in the noiseless model (where it doesn’t decrease).
Suppose can solve Partition of elements . We can construct algorithm based on to solve Select of elements with the same number of round and twice the number of comparisons. If the original success probability was , the new success probability is at least .
Let’s show the reductions one by one:
Wlog, let . The algorithm to solve Select / Partition is the following:
Generate dummy elements which are smaller than all the elements.
Run on elements together with dummy elements.
Output ’s output.
It’s easy to check the above algorithm works.
is the following:
Run , let be the median output.
In the next round, compare every element to (once in the noiseless model, times in the erasure model, and times in the noisy model). For each element, if it beats , accept it (in the noisy model, if at least half of the comparisons beat ). Otherwise reject it.
It’s easy to check works.
is the following:
Generate two dummy elements and . is smaller than all of the elements and is larger than all of the elements.
Do the followings in parallel:
Run on elements and .
Run on elements and .
Output the element that is accepted in the first run and rejected in the second run.
It’s easy to see that works.
For , . For ,
First, note that the function has derivative zero exactly once in , at . So the function is increasing on and decreasing on . This means that , and that . Putting these together, we get:
Note that , and , completing the proof. ∎
The distribution that is heads with probability and tails with probability has bits of entropy.
Consider a biased random walk that moves right with probability and left with probability at every step. Then the probability that this random walk reaches units right of the origin at any point in time is exactly .
First, note that the probability that this random walk reaches units right of the origin at any point in time is exactly the probability that a random walk with the same bias reaches unit right of the origin times independently, because once the random walk reaches unit right of the origin, the remaining random walk acts like a fresh random walk that now only needs to move units to the right at some point in time. So we just need to show that the probability that the random walk moves unit to the right at some point in time is .
Note that whatever this probability, , is, it satisfies the equality . This is because the probability that the random walk moves right is equal to the probability that the random walk moves right on its first step, plus the probability that the random walk moves left on its first step, and then moves two units right at some point in time. This equation has two solutions, and . So now we just need to show that when .
Assume for contradiction that . Then this means that the random walk not only reaches one unit right of the origin once during the course of the random walk, but that it reaches one unit right of the origin infinitely many times, as every time the walk reaches the origin is a fresh random walk that moves one unit right with probability one. So let denote the random variable that is if the walk moves right at time , and if the walk moves left at time . We have just argued that if , there are infinitely many such that . Therefore, . However, we also know that , and the s are independent. So the law of large numbers states that , a contradiction.
We also include a technical lemma confirming that it is okay to do all of our sampling without replacement, if desired.
Let be any set of elements, all in . Let be samples from without replacement. Then .
Define . It’s clear that the , form a martingale (specifically, the Doob martingale for ), and that . So we just need to reason about how much the conditional expectation can possibly change upon learning a single , then we can apply Azuma’s inequality.
Conditioned on , each of the remaining elements of are equally likely to be chosen, and each is chosen with probability exactly . So . How much can possibly be? We have (below, unsampled means elements that are still unsampled even after step ):
It is clear that the above quantity is at most , because all , and as well. It is also clear that the above quantity is at least , as , and . So the Doob martingale has differences at most at each step, and a direct application of Azuma’s inequality yields the desired bound. ∎
Appendix B Proofs for Non-Adaptive Algorithms
b.1 Upper Bounds
Proof of Theorem 1: We consider the error contributed by (which is the same as ). There are two events that might cause to be misplaced. First, maybe loses to for some . This is unlikely because this can only happen in the event that is a poor approximation to the median. Second, maybe is never compared to an element . This is also unlikely because the fraction of such elements should be about . We bound the probabilty of the former event first.
Let denote the fraction of elements in greater than . Then with probability at least .
is the average of independent random variables, each denoting whether some . . Applying Lemma 4 yields the lemma. ∎
We call a skeleton set is good if for , . Now let’s fix the skeleton set and assume is good. From the above lemma, we know is good with probability at least .
If is good, then the probability that is rejected is at most (and at most if ).
The elements that are compared to are chosen uniformly at random from . At least a fraction of them are less than , so an fraction of elements in lie in . So each time we choose a random element of to compare to , we have at least a chance of comparing to an element in . In the event that this happens, we are guaranteed to accept . The probability that we miss on each of independent trials is exactly . ∎
Therefore, conditioning that is good, the expected -weighted error contributed by elements is at most :
The last inequality is a corollary of Lemma 2 proved in Appendix A. Taking shows that the expected -weighted error contributed by elements is conditioned on is good. Notice that when is fixed, the -weighted error contributed by each element is independent and bounded by . By the Hoeffding bound, the probability that the -weighted error exceeds its expectation by more than is at most conditioned on is good. To sum up, taking elements that are smaller than the median into account, we know that with probability at least , the -weighted error is .
Here we also show that our algorithm is better than a simpler solution. The simpler solution would be to just compare every element to a random elements, and accept if it is larger than at least half, and reject otherwise. We show that, unlike the algorithm above, this doesn’t obtain asymptotically optimal error as grows.
The simple solution has expected error .
Let be an indicator variable for the event that is smaller than the element it is compared to. Then is accepted iff . As each is a bernoulli random variable that is with probability , the probability that is mistakenly rejected is exactly the probability that a random variable exceeds its expectation by at least . This happens with probability on the order of . So for , this is at least , meaning that the error contribution from all , for some absolute constant is at least . Summing over all means that the total error is at least . ∎
Proof of Theorem 2: We again consider the error contributed by (which is the same as ). There are again two events that might cause to be misplaced. First, maybe it is beaten an element in . This is unlikely because this can only happen in the event that some element below the median makes it into . Second, maybe is never compared to an element that beats it in . This is also unlikely because the fraction of such elements should be about . We bound the probabilty of the former event first, making use of Lemma 5.
Again let denote the fraction of elements in that are smaller than , and define to be good if for all . Then Lemma 5 guarantees that is good with probability at least . This time, in addition to being good, we also need to make sure that is large (which will happen as long as not too many comparisons are erased).
Let such that and . Then with probability at least , is known to beat .
is known to beat if there is some such that the comparisons between and and and are both not erased. There are such possible , and all comparisons are erased independently. So the probability that for all , at least one of the comparisons to were erased is . ∎
For all , with probability at least , both and have at least elements.
By Lemma 7 and a union bound, with probability at least , it is the case that for all that have at least elements between them, it is known whether beats or vice versa. In the event that this happens, any element that is at least elements away from the median will be in or . ∎
We’ll call a skeleton set really good if it is good, and . Now let’s fix the skeleton set and assume is really good. From the above arguments, we know is really good with probability at least .
Next, observe that if , and , then there are at least elements in less than . Therefore the probability that never beats an element in is at most (and at most if ). So the total -weighted error that comes from these cases is at most .
Conditioning on is really good, the expected -weighted error is
Taking shows that the expected -weighted error contributed by elements is conditioned on is really good. Notice that when is fixed, the -weighted error contributed by each element is independent and bounded by . By the Hoeffding bound, the probability that the -weighted error exceeds its expectation by more than is at most , conditioned on is really good. To sum up, taking elements that are smaller than the median into account, we know that with probability at least , the -weighted error is .