Analysis of the Expected Number of Bit Comparisons
Required by Quickselect
James Allen Fill111Research for both authors supported by NSF grant DMS–0406104, and by The Johns Hopkins University’s Acheson J. Duncan Fund for the Advancement of Research in Statistics.
Department of Applied Mathematics and Statistics
The Johns Hopkins University
firstname.lastname@example.org and http://www.ams.jhu.edu/~fill/
Department of Applied Mathematics and Statistics
The Johns Hopkins University
email@example.com and http://www.ams.jhu.edu/~nakama/
When algorithms for sorting and searching are applied to keys that are represented as bit strings, we can quantify the performance of the algorithms not only in terms of the number of key comparisons required by the algorithms but also in terms of the number of bit comparisons. Some of the standard sorting and searching algorithms have been analyzed with respect to key comparisons but not with respect to bit comparisons. In this paper, we investigate the expected number of bit comparisons required by Quickselect (also known as Find). We develop exact and asymptotic formulae for the expected number of bit comparisons required to find the smallest or largest key by Quickselect and show that the expectation is asymptotically linear with respect to the number of keys. Similar results are obtained for the average case. For finding keys of arbitrary rank, we derive an exact formula for the expected number of bit comparisons that (using rational arithmetic) requires only finite summation (rather than such operations as numerical integration) and use it to compute the expectation for each target rank.
AMS 2000 subject classifications. Primary 68W40; secondary 68P10, 60C05.
Key words and phrases. Quickselect, Find, searching algorithms, asymptotics, average-case analysis, key comparisons, bit comparisons.
Date. June 15, 2007.
1 Introduction and Summary
When an algorithm for sorting or searching is analyzed, the
algorithm is usually regarded either as comparing keys pairwise
irrespective of the keys’ internal structure or as operating on
representations (such as bit strings) of keys. In the former case,
analyses often quantify the performance of the algorithm in terms of
the number of key comparisons required to accomplish the task;
Quickselect (also known as Find) is an example of those algorithms
that have been studied from this point of view. In the latter case,
if keys are represented as bit strings, then analyses quantify the
performance of the algorithm in terms of the number of bits compared
until it completes its task. Digital search trees, for example,
have been examined from this
In order to fully quantify the performance of a sorting or searching algorithm and enable comparison between key-based and digital algorithms, it is ideal to analyze the algorithm from both points of view. However, to date, only Quicksort has been analyzed with both approaches; see Fill and Janson . Before their study, Quicksort had been extensively examined with regard to the number of key comparisons performed by the algorithm (e.g., Knuth , Régnier , Rösler , Knessl and Szpankowski , Fill and Janson , Neininger and Rüschendorf ), but it had not been examined with regard to the number of bit comparisons in sorting keys represented as bit strings. In their study, Fill and Janson assumed that keys are independently and uniformly distributed over (0,1) and that the keys are represented as bit strings. [They also conducted the analysis for a general absolutely continuous distribution over (0,1).] They showed that the expected number of bit comparisons required to sort keys is asymptotically equivalent to as compared to the lead-order term of the expected number of key comparisons, which is asymptotically . We use ln and lg to denote natural and binary logarithms, respectively, and use log when the base does not matter (for example, in remainder estimates).
In this paper, we investigate the expected number of bit comparisons required by Quickselect. Hoare  introduced this search algorithm, which is treated in most textbooks on algorithms and data structures. Quickselect selects the -th smallest key (we call it the rank- key) from a set of distinct keys. (The keys are typically assumed to be distinct, but the algorithm still works—with a minor adjustment—even if they are not distinct.) The algorithm finds the target key in a recursive and random fashion. First, it selects a pivot uniformly at random from keys. Let denote the rank of the pivot. If , then the algorithm returns the pivot. If , then the algorithm recursively operates on the set of keys smaller than the pivot and returns the rank- key. Similarly, if , then the algorithm recursively operates on the set of keys larger than the pivot and returns the ()-th smallest key from the subset. Although previous studies (e.g., Knuth , Mahmoud et al. , Grübel and U. Rösler , Lend and Mahmoud , Mahmoud and Smythe , Devroye , Hwang and Tsai ) examined Quickselect with regard to key comparisons, this study is the first to analyze the bit complexity of the algorithm.
We suppose that the algorithm is applied to distinct keys that are represented as bit strings and that the algorithm operates on individual bits in order to find a target key. We also assume that the keys are uniformly and independently distributed in . For instance, consider applying Quickselect to find the smallest key among three keys , , and whose binary representations are .01001100…, .00110101…, and .00101010…, respectively. If the algorithm selects as a pivot, then it compares each of and to in order to determine the rank of . When and are compared, the algorithm requires 2 bit comparisons to determine that is smaller than because the two keys have the same first digit and differ at the second digit. Similarly, when and are compared, the algorithm requires 4 bit comparisons to determine that is smaller than . After these comparisons, key has been identified as smallest. Hence the search for the smallest key requires a total of 6 bit comparisons (resulting from the two key comparisons).
We let denote the expected number of bit comparisons required to find the rank- key in a file of keys by Quickselect. By symmetry, . First, we develop exact and asymptotic formulae for , the expected number of bit comparisons required to find the smallest key by Quickselect, as summarized in the following theorem.
The expected number of bit comparisons required by Quickselect to find the smallest key in a file of keys that are independently and uniformly distributed in has the following exact and asymptotic expressions:
where and denote harmonic and Bernoulli numbers, respectively, and, with and , we define
The asymptotic formula shows that the expected number of bit
comparisons is asymptotically linear in with the lead-order
coefficient approximately equal to 5.27938. Hence the expected
number of bit comparisons is asymptotically different from that of
key comparisons required to find the smallest key only by a
constant factor (the expectation for key comparisons is
asymptotically 2). Complex-analytical methods are utilized to
obtain the asymptotic formula. Details of the derivations of the
described in Section 3.
We also derive exact and asymptotic expressions for the expected number of bit comparisons for the average case. We denote this expectation by . In the average case, the parameter in is considered a discrete uniform random variable; hence The derived asymptotic formula shows that is also asymptotically linear in ; see (4.48). More detailed results for are described in Section 4.
Lastly, in Section 5, we derive an exact expression of for each fixed that is suited for computations. Our preliminary exact formula for [shown in (2.8)] entails infinite summation and integration. As a result, it is not a desirable form for numerically computing the expected number of bit comparisons. Hence we establish another exact formula that only requires finite summation and use it to compute for , . The computation leads to the following conjectures: (i) for fixed , increases in for and is symmetric about ; and (ii) for fixed , increases in (asymptotically linearly).
To investigate the bit complexity of Quickselect, we follow the general approach developed by Fill and Janson . Let denote the keys uniformly and independently distributed on (0, 1), and let denote the rank- key. Then, for (assume ),
To determine the first probability in (2.1), note that
remain in the same subset until the first
time that one of them is chosen as a pivot. Therefore, and
are compared if and only if the first pivot chosen from
is either or . Analogous
arguments establish the other two cases.
For , it is well known that the joint density function of and is given by
Clearly, the event that and are compared is independent of the random variables and . Hence, defining
[the sums in (2.3)–(2.5) are double sums over and ], and letting denote the index of the first bit at which the keys and differ, we can write the expectation of the number of bit comparisons required to find the rank- key in a file of keys as
in this expression, note that represents the last bit at which and agree.
3 Analysis of
3.1 Exact Computation of
Making the change of variables and integrating, and recalling , we find, after some calculation,
To further transform (3.3), define
where denotes the -th Bernoulli number. Let . Then (see Knuth ), and
To simplify , note that
Plugging (3.7) into (LABEL:intermediateMu1_1) and recalling for , we finally obtain
where denotes the -th harmonic number and
The last equality in (3.8) follows from the easy identity
3.2 Asymptotic Analysis of
For , let and . Let denote Euler’s constant , and define . Then
and denotes the -th Harmonic number of order 2, i.e., .
In this lemma, and are derived in order to
obtain the exact expression for in (iii). From
(3.8), the exact expression for also provides an
alternative exact expression for .
Before proving Lemma 3.1, we complete the proof of Theorem 1.1 using part (iii). We know
Now we prove Lemma 3.1:
it follows that
where is a positively oriented closed curve that
encircles the integers 0,, and does not include or
encircle any of the following points: (where ), ; ; and .
Equality (3.14) follows from the fact that the
Bernoulli numbers are extrapolated by the Riemann zeta function
taken at nonnegative integers: . [The
coefficients do not concern us since the Bernoulli numbers
of odd index greater than 1 vanish.] Equality
(3.15) follows from a direct application
of residue calculus, taking into account contributions of the simple
poles at the integers 0,, .
Let denote the integrand in (3.15):
We consider a positively oriented rectangular contour with horizontal sides and , where , , and vertical sides and , where . By elementary bounds on along and the fact that
(this is implicit on page 113 of Flajolet and Sedgewick  and explicitly proved in the Appendix), one can show that
Accounting for residues due to the poles encircled by , we obtain
(ii) We have . Hence, from (i),