Worst-Case Efficient Sorting with QuickMergesort

# Worst-Case Efficient Sorting with QuickMergesort

Stefan Edelkamp King’s College London, UK.    Armin Weiß Universität Stuttgart, Germany. Supported by the DFG grant DI 435/7-1.
###### Abstract

The two most prominent solutions for the sorting problem are Quicksort and Mergesort. While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but needs extra space for a linear number of elements. Worst-case efficient in-place sorting, however, remains a challenge: the standard solution, Heapsort, suffers from a bad cache behavior and is also not overly fast for in-cache instances.

In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be implemented in place. Our new variant applies the median-of-medians algorithm for selecting pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most comparisons for large enough.

We experimentally confirm the theoretical estimates and show that the new algorithm outperforms Heapsort by far and is only around 10% slower than Introsort (std::sort implementation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate the worst case, which is only around 10% slower than the average case. In particular, the new algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort.

keywords: in-place sorting, quicksort, mergesort, analysis of algorithms

## 1 Introduction

Sorting elements of some totally ordered universe always has been among the most important tasks carried out on computers. Comparison based sorting of elements requires at least comparisons (where is base 2). Up to constant factors this bound is achieved by the classical sorting algorithms Heapsort, Mergesort, and Quicksort. While Quicksort usually is considered the fastest one, the -bound applies only for its average case (both for the number of comparisons and running time) – in the worst-case it deteriorates to a algorithm. The standard approach to prevent such a worst-case is Musser’s Introsort : whenever the recursion depth of Quicksort becomes too large, the algorithm switches to Heapsort (we call this the worst-case stopper). This works well in practice for most instances. However, on small instances Heapsort is already considerably slower than Quicksort (in our experiments more than 30% for ) and on larger instances it suffers from its poor cache behavior (in our experiments more than eight times slower than Quicksort for sorting elements). This is also the reason why in practice it is mainly used as a worst-case stopper in Introsort.

Another approach for preventing Quicksort’s worst case is by using the median-of-medians algorithm  for pivot selection. However, choosing the pivot as median of the whole array yields a bad average (and worst-case) running time. On the other hand, when choosing the median of a smaller sample as pivot, the average performance becomes quite good , but the guarantees for the worst case become even worse.

The third algorithm, Mergesort, is almost optimal in terms of comparisons: it uses only comparisons in the worst-case to sort elements. Moreover, it performs well in terms of running time. Nevertheless, it is not used as worst-case stopper for Introsort because it needs extra space for a linear number of data elements. In recent years, several in-place (we use the term for at most logarithmic extra space) variants of Mergesort appeared, both stable ones (meaning that the relative order of elements comparing equal is not changed) [18, 23, 16] and unstable ones [6, 13, 16, 22]. Two of the most efficient implementations of stable variants are Wikisort  (based on ) and Grailsort  (based on ). An example for an unstable in-place Mergesort implementation is in-situ Mergesort . It uses Quick/Introselect  (std::nth_element) to find the median of the array. Then it partitions the array according to the median (i. e., move all smaller elements to the right and all greater elements to the left). Next, it sorts one half with Mergesort using the other half as temporary space, and, finally, sort the other half recursively. Since the elements in the temporary space get mixed up (they are used as “dummy” elements), this algorithm is not stable. In-situ Mergesort gives an bound for the worst case. As validated in our experiments all the in-place variants are considerably slower than ordinary Mergesort.

When instead of the median an arbitrary element is chosen as the pivot, we obtain QuickMergesort , which is faster on average – with the price that the worst-case can be quadratic. QuickMergesort follows the more general concept of QuickXsort: first, choose a pivot element and partition the array according to it. Then, sort one part with X and, finally, the other part recursively with QuickXsort. As for QuickMergesort, the part which is currently not being sorted can be used as temporary space for X.

Other examples for QuickXsort are QuickHeapsort [5, 9] and QuickWeakheapsort [10, 11] and Ultimate Heapsort . QuickXsort with median-of- pivot selection uses at most comparisons on average to sort elements given that X also uses at most comparisons on average . Moreover, recently Wild  showed that, if the pivot is selected as median of some constant size sample, then the average number of comparisons of QuickXsort is only some small linear term (depending on the sample size) above the average number of comparisons of (for the median-of-three case see also ). However, as long as no linear size samples are used for pivot selection, QuickXsort does not provide good bounds for the worst case. This defect is overcome in Ultimate Heapsort  by using the median of the whole array as pivot. In Ultimate Heapsort the median-of-medians algorithms  (which is linear in the worst case) is used for finding the median, leading to an bound for the number of comparisons. Unfortunately, due to the large constant of the median-of-medians algorithm, the -term is quite big.

#### Contribution.

In this work we introduce median-of-medians QuickMergesort (MoMQuickMergesort) as a variant of QuickMergesort using the median-of-medians algorithms for pivot selection. The crucial observation is that it is not necessary to use the median of the whole array as pivot, but only the guarantee that the pivot is not very far off the median. This observation allows to apply the median-of-medians algorithm to smaller samples leading to both a better average- and worst-case performance. Our algorithm is based on a merging procedure introduced by Reinhardt , which requires less temporary space than the usual merging. A further improvement, which we call undersampling (taking less elements for pivot selection into account), allows to reduce the worst-case number of comparisons down to . Moreover, we heuristically estimate the average case as comparisons. The good average case comes partially from the fact that we introduce a new way of adaptive pivot selection for the median-of-medians algorithm (compare to ). Our experiments confirm the theoretical and heuristic estimates and also show that MoMQuickMergesort is competitive to other algorithms (for more than 7 times faster than Heapsort and around 10% slower than Introsort (std::sort – throughout this refers to its libstdc++ implementation)). Moreover, we apply MoMQuickMergesort (instead of Heapsort) as a worst-case stopper for Introsort (std::sort). The results are striking: on special permutations, the new variant is up to six times faster than the original version of std::sort.

#### Outline.

In Section 2, we recall QuickMergesort and the median-of-medians algorithm. In Section 3, we describe median-of-medians QuickMergesort, introduce the improvements and analyze the worst-case and average-case behavior. Finally, in Section 4, we present our experimental results.

## 2 Preliminaries

Throughout we use standard and notation as defined e. g. in . The logarithm always refers to base 2. For a background on Quicksort and Mergesort we refer to  or . A pseudomedian of nine (resp. fifteen) elements is computed as follows: group the elements in groups of three elements and compute the median of each group. The pseudomedian is the median of these three (resp. five) medians.

Throughout, in our estimates we assume that the median of three (resp. five) elements is computed using three (resp. seven) comparisons no matter on the outcome of previous comparisons. This allows a branch-free implementation of the comparisons.

In this paper we have to deal with simple recurrences of two types, which both have straightforward solutions:

###### Lemma 2.1:

Let with , , and and

 T(n) Q(n) ≤Q(⌈αn⌉+A)+γnlog(γn)+Cn+O(nδ)

for and for ( large enough). Moreover, let such that (notice that ). Then

 T(n) ≤Cn1−α−β+O(nζ)and Q(n) ≤nlogn+(αlogαγ+logγ+Cγ)n+O(nδ).

###### Proof

It is well-known that has a linear solution. Therefore, (after replacing by a reasonably smooth function) and differ by at most some constant. Thus, after increasing , we may assume that is of the simpler form

 (1) T(n) ≤T(αn)+T(βn)+Cn+D.

We can split (1) into two recurrences

 TC(n) ≤T(αn)+T(βn)+Cnand TD(n) ≤T(αn)+T(βn)+D

with and for . For we get the solution . By the generalized Master theorem , it follows that where satisfies . Thus,

 T(n)≤Cn1−α−β+O(nζ).

Now, let us consider the recurrence for . With the same argument as before we have . Thus, we obtain

 Q(n) ≤logαn∑i=0(αiγnlog(αiγn)+Cαin+O((αin)δ)) =n∑i≥0(αiγ(logn+ilog(α)+logγ)+Cαi)+O(nδ) =γ1−αnlogn+(αγlogα(α−1)2+γlogγ1−α+C1−α)n+O(nδ) =nlogn+(αlogα1−α+logγ+C1−α)n+O(nδ).

This proves Lemma 2.1.

### 2.1 QuickMergesort

QuickMergesort follows the design pattern of QuickXsort: let X be some sorting algorithm (in our case X = Mergesort). QuickXsort works as follows: first, choose some pivot element and partition the array according to this pivot, i. e., rearrange it such that all elements left of the pivot are less or equal and all elements on the right are greater than or equal to the pivot element. Then, choose one part of the array and sort it with the algorithm X. After that, sort the other part of the array recursively with QuickXsort. The main advantage of this procedure is that the part of the array that is not being sorted currently can be used as temporary memory for the algorithm X. This yields fast in-place variants for various external sorting algorithms such as Mergesort. The idea is that whenever a data element should be moved to the extra (additional or external) element space, instead it is swapped with the data element occupying the respective position in part of the array which is used as temporary memory.

The most promising example for QuickXsort is QuickMergesort. For the Mergesort part we use standard (top-down) Mergesort, which can be implemented using extra element spaces to merge two arrays of length : after the partitioning, one part of the array – for a simpler description we assume the first part – has to be sorted with Mergesort (note, however, that any of the two sides can be sorted with Mergesort as long as the other side contains at least elements). In order to do so, the second half of this first part is sorted recursively with Mergesort while moving the elements to the back of the whole array. The elements from the back of the array are inserted as dummy elements into the first part. Then, the first half of the first part is sorted recursively with Mergesort while being moved to the position of the former second half of the first part. Now, at the front of the array, there is enough space (filled with dummy elements) such that the two halves can be merged. The executed stages of the algorithm QuickMergesort are illustrated in Figure 1.

### 2.2 The median-of-medians algorithm

The median-of-medians algorithm solves the selection problem: given an array and an integer find the -th element in the sorted order of . For simplicity let us assume that all elements are distinct – in Section 2.3 we show how to deal with the general case with duplicates.

The basic variant of the median-of-medians algorithm  (see also [8, Sec. 9.3]) works as follows: first, the array is grouped into blocks of five elements. From each of these blocks the median is selected and then the median of all these medians is computed recursively. This yields a provably good pivot for performing a partitioning step. Now, depending on which side the -th element is, recursion takes place on the left or right side. It is well-known that this algorithm runs in linear time with a rather big constant in the -notation. We use a slight improvement:

#### Repeated step algorithm.

Instead of grouping into blocks of elements, we follow  and group into blocks of elements and take the pseudomedian (“ninther”) into the sample for pivot selection. This method guarantees that every element in the sample has elements less or equal and element greater or equal to it. Thus, when selecting the pivot as median of the sample of elements, the guarantee is that at least elements are less or equal and the same number greater or equal to the pivot. Since there might remain 8 elements outside the sample we obtain the recurrence

where of is due to finding the pseudomedians and is for partitioning the remaining (non-pseudomedian) elements according to the pivot (notice that also some of the other elements are already known to be greater/smaller than the pivot; however, using this information would introduce a huge bookkeeping overhead). Thus, by Lemma 2.1, we have:

###### Lemma 2.2 ([1, 7]):

where satisfies .

For our implementation we apply a slight improvement over the basic median-of-medians algorithm by using the approach of adaptive pivot selection, which is first used in the Floyd-Rivest algorithm [14, 15], later applied to smaller samples for Quickselect [26, 27], and recently applied to the median-of-medians algorithm . However, we use a different approach than in : in any case we choose the sample of size as pseudomedians of nine. Now, if the position we are looking for is on the far left (left of position ), we do not choose the median of the sample as pivot but a smaller position: for searching the -th element with , we take the -th element of the sample as pivot. Notice that for , this is exactly the median of the sample. Since every element of the sample carries at least four smaller elements with it, this guarantees that elements are smaller than or equal to the pivot – so the -th element will lie in the left part after partitioning (which is presumably the smaller one). Likewise when searching a far right position, we proceed symmetrically.

Notice that this optimization does not improve the worst-case but the average case (see Section 3.4).

### 2.3 Dealing with duplicate elements

With duplicates we mean that not all elements of the input array are distinct. The number of comparisons for finding the median of three (resp. five) elements does not change in the presence of duplicates. However, duplicates can lead to an uneven partition. The standard approach in Quicksort and Quickselect for dealing with duplicates is due to Bentley and McIlroy : in each partitioning step the elements equal to the pivot are placed in a third partition in the middle of the array. Recently, another approach appeared in the Quicksort implementation pdqsort . Instead of three-way partitioning it applies the usual two-way partitioning moving elements equal to the pivot always to the right side. This method is also applied recursively – with one exception: if the new pivot is equal to an old pivot (this can be tested with one additional comparison), then all elements equal to the pivot are moved to the left side, which then can be excluded from recursion.

We propose to follow the latter approach: usually all elements equal to the pivot are moved to the right side – possibly leading to an even unbalanced partitioning. However, whenever a partitioning step is very uneven (outside the guaranteed bounds for the pivot in the median-of-medians algorithm), we know that this must be due to many duplicate elements. In this case we immediately partition again with the same pivot but moving equal elements to the left.

## 3 Median-of-Medians QuickMergesort

Although QuickMergesort has an worst-case running time, it is quite simple to guarantee a worst-case number of comparisons of : just choose the median of the whole array as pivot. This is essentially how in-situ Mergesort  works. The most efficient way for finding the median is using Quickselect  as applied in in-situ Mergesort. However, this does not allow the desired bound on the number of comparisons (even not when using Introselect as in ). Alternatively, we can use the median-of-medians algorithm described in Section 2.2, which, while having a linear worst-case running time, on average is quite slow. In this section we describe a variation of the median-of-medians approach which combines an worst-case number of comparisons with a good average performance (both in terms of running time and number of comparisons).

### 3.1 Basic version

The crucial observation is that it is not necessary to use the actual median as pivot (see also our preprint ). As remarked in Section 2.1, the larger of the two sides of the partitioned array can be sorted with Mergesort as long as the smaller side contains at least one third of the total number of elements. Therefore, it suffices to find a pivot which guarantees such a partition. For doing so, we can apply the idea of the median-of-medians algorithm: for sorting an array of elements, we choose first elements as median of three elements each. Then, the median-of-medians algorithm is used to find the median of those elements. This median becomes the next pivot. Like for the median-of-medians algorithm, this ensures that at least elements are less or equal and at least the same number of elements are greater or equal than the pivot – thus, always the larger part of the partitioned array can be sorted with Mergesort and the recursion takes place on the smaller part. The advantage of this method is that the median-of-medians algorithm is applied to an array of size only instead of (with the cost of introducing a small overhead for finding the medians of three) – giving less weight to its big constant for the linear number of comparisons. We call this algorithm basic MoMQuickMergesort ().

For the median-of-medians algorithm, we use the repeated step method as described in Section 2.2. Notice that for the number of comparisons the worst case for MoMQuickMergesort happens if the pivot is exactly the median since this gives the most weight on the “slow” median-of-medians algorithm. Thus, the total number of comparisons of MoMQuickMergesort in the worst case to sort elements is bounded by

 TbMQMS(n) ≤TbMQMS(n2)+TMS(n2)+TMoM(n3)+3⋅n3+23n+O(1)

where is the number of comparisons of Mergesort and the number of comparisons of the median-of-medians algorithm. The -term comes from finding medians of three elements, the comparisons from partitioning the remaining elements (after finding the pivot, the correct side of the partition is known for elements).

By Lemma 2.2 we have and by  we have . Thus, we can use Lemma 2.1 to resolve the recurrence, which proves (notice that for every comparison there is only a constant number of other operations):

###### Theorem 3.3:

Basic MoMQuickMergesort () runs in time and performs at most comparisons.

### 3.2 Improved version

In , Reinhardt describes how to merge two subsequent sequences in an array using additional space for only half the number of elements in one of the two sequences. The additional space should be located in front or after the two sequences. To be more precise, assume we are given an array with positions being empty or containing dummy elements (to simplify the description, we assume the first case), and containing two sorted sequences. We wish to merge the two sequences into the space (so that becomes empty). We require that .

First we start from the left merging the two sequences into the empty space until there remains no empty space between the last element of the already merged part and the first element of the left sequence (first step in Figure 2). At this point, we know that at least elements of the right sequence have been introduced into the merged part (because when introducing elements from the left part, the distance between the last element in the already merged part and the first element in the left part does not decrease). Thus, the positions through are empty now. Since , in particular, is empty now. Therefore, we can start merging the two sequences right-to-left into the now empty space (where the right-most element is moved to position  – see the second step in Figure 2). Once the empty space is filled, we know that all elements from the right part have been inserted, so is sorted and is empty (last step in Figure 2).

When choosing (in order to have a balanced merging and so an optimal number of comparisons), we need one fifth of the array as temporary space. Moreover, by allowing a slightly imbalanced merge we can also tolerate slightly less temporary space. In the case that the temporary space is large (), we apply the merging scheme from Section 2.1. The situation where the temporary space is located after the two sorted sequences is handled symmetrically (note that this changes the requirement to ).

By applying this merging method in MoMQuickMergesort, we can use pivots having much weaker guarantees: instead of one third, we need only one fifth of the elements being less (resp. greater) than the pivot. We can find such pivots by applying an idea similar to the repeated step method for the median-of-medians algorithm: first we group into blocks of fifteen elements and compute the pseudomedians of each group. Then, the pivot is selected as median of these pseudomedians; it is computed using the median-of-medians algorithm. This guarantees that at least elements are less than or equal to (resp. greater than or equal to) the pivot. Computing the pseudomedian of 15 elements requires 22 comparisons (five times three comparisons for the medians of three and then seven comparisons for the median of five). After that, partitioning requires comparisons. Since still in any case the larger half can be sorted with Mergesort, we get the recurrence (we call this algorithm MoMQuickMergesort ())

 TMQMS(n) ≤TMQMS(n/2)+TMS(n/2)+TMoM(n/15)+2215n+1415n+O(1) ≤TMQMS(n/2)+n2log(n/2)−0.91n2+2015n+3615n+O(n0.8) (by Lemma 2.1) ≤nlogn−0.91n−2n+11215n+O(n0.8)

This proves:

###### Theorem 3.4:

MoMQuickMergesort () runs in time and performs at most comparisons.

Notice that when computing the median of pseudomedians of fifteen elements, in the worst case approximately the same effort goes into the calculation of the pseudomedians and into the median-of-medians algorithm. This indicates that it is an efficient method for finding a pivot with the guarantee that one fifth are greater or equal (resp. less or equal).

### 3.3 Undersampling

In  Alexandrescu selects pivots for the median-of-medians algorithm not as medians of medians of the whole array but only of elements where is some large constant (similar as in  for Quicksort). While this improves the average case considerably and still gives a linear time algorithm, the hidden constant for the worst case is large. In this section we follow the idea to a certain extent without loosing a good worst-case bound.

As already mentioned in Section 3.2, Reinhardt’s merging procedure  works also with less than one fifth of the whole array as temporary space if we do not require to merge sequences of equal length. Thus, we can allow the pivot to be even further off the median – with the cost of making the Mergesort part more expensive due to imbalanced merging. For we describe a variant of MoMQuickMergesort using only elements for sampling the pivot. Before we analyze this variant, let us look at the costs of Mergesort with imbalanced merging: in order to apply Reinhardt’s merging algorithm, we need that one part is at most twice the length of the temporary space. We always apply linear merging (no binary insertion) meaning that merging two sequences of combined length costs at most comparisons. Thus, we get the following estimate for the worst case number of comparisons of Mergesort where is the number of elements to sort and is the temporary space (= “buffer”):

 TMS,b(n,m) ≤{TMS(n)if n≤4mn+TMS,b(n−2m,m)+TMS(2m)otherwise.

If (otherwise, there is nothing to do), this means

 TMS,b(n,m) (2)

For a moment let us assume that (with ). In this case we have

 TMS,b(n,n2ℓ) ≤(ℓ−2)⋅T(nℓ)+T(2nℓ)+n⋅(ℓ−2)−n2ℓ⋅(ℓ−2)(ℓ−3) ≤(ℓ−2)⋅nℓ⋅(log(nℓ)−κ)+2nℓ⋅(log(2nℓ)−κ)+n⋅(ℓ−2)⋅(12+32ℓ) ≤nlogn+n⋅f(ℓ)

for large enough where is defined by

 f(ℓ)={−κ−logℓ+ℓ/2+1/2−1/ℓ% for ℓ≥2−κotherwise

and is a bound for the number of comparisons of Mergesort for large enough ( by ). Now, for arbitrary we can use as an approximation, which turns out to be quite precise:

###### Lemma 3.5:

Let be defined as above and write . Then for and large enough we have

 TMS,b(n ,m)≤nlogn+n⋅f(n2m)+m⋅ϵ(ξ)

where .

###### Proof

Let be large enough such that . If for we have and so the lemma holds. Now let . By (2) we obtain

 (3) TMS,b(n,m)≤ (n2m+ξ−2)TMS(2m)+TMS((2−ξ)2m) (4) +n⋅(n2m+ξ−2)−m⋅(n2m+ξ−2)(n2m+ξ−3).

We examine the two terms (3) and (4) separately using :

 \allowdisplaybreaks(???) =(n2m+ξ−2)TMS(2m)+TMS((2−ξ)2m) ≤(n2m+ξ−2)2m(log2m−κ)+(2−ξ)2m(log((2−ξ)2m)−κ) =(nlog2m−κn)+(ξ−2)2m(log2m−κ)+(2−ξ)2m(log((2−ξ)2m)−κ) =(nlogn−κn)−nlog(n2m)+(2−ξ)2m⋅((log((2−ξ)2m)−κ)−(log2m−κ)) =(nlogn−κn)−nlog(n2m)+(2−ξ)2mlog(2−ξ) and (???) =n⋅(n2m+ξ−2)−m⋅(n2m+ξ−2)(n2m+ξ−3) =n⋅(n2m−2)−m⋅(n2m−2)(n2m−3)+nξ−m(ξ(n2m−3)+(n2m−2)ξ+ξ2) =n⋅(n2m−2−mn⋅((n2m)2−5n2m+6))+m(5ξ−ξ2) =n⋅(n2⋅2m+12−3⋅2mn)+m(5ξ−ξ2).

Thus,

 TMS,b(n,m)−(nlogn+n⋅f(n2m)) ≤(2−ξ)2mlog(2−ξ)+m(5ξ−ξ2)−4m.

This completes the proof of Lemma 3.5.

For selecting the pivot in QuickMergesort, we apply the procedure of Section 3.2 to elements (for some parameter , ): we select elements from the array, group them into groups of fifteen elements, compute the pseudomedian of each group, and take the median of those pseudomedians as pivot. We call this algorithm MoMQuickMergesort with undersampling factor (). Note that . For its worst case number of comparisons we have

 TMQMSθ(n) ≤max15θ≤α≤12TMQMSθ(αn)+TMS,b(n(1−α),αn) +2215θn+2015θn+(1−115θ)n+O(n0.8)

where the is for finding the pseudomedians of fifteen, the is for the median-of-medians algorithm called on elements and is for partitioning the remaining elements. Now we plug in the bound of Lemma 3.5 for with and apply Lemma 2.1:

 TMQMSθ(n) ≤max15θ≤α≤12TMQMSθ(αn)+(1−α)nlog((1−α)n) +(1−α)n(f(1−α2α)+ϵ)+n⋅(1+4115θ)+O(n0.8) ≤nlogn+n⋅max15θ≤α≤12g(α,θ)+O(n0.8)

for

 g(α,θ) =αlog(α)1−α+log(1−α)+f(1−α2α)+11−α⋅(1+4115θ)+ϵ.

In order to find a good undersampling factor, we wish to find a value for minimizing . While we do not have a formal proof, intuitively the maximum should be either reached for (if is small) or for (if is large) – see Figure 4 for a special value of . Moreover, notice that we are dealing with an upper bound on only (with a small error due to Lemma 3.5 and the bound for ), so even if we could find the which minimizes , this might not be optimal.

We proceed as follows: first, we compute the point where the two curves in Figure 3 intersect. For this particular value of , we then show that indeed . Since is monotonically decreasing (this is obvious) and is monotonically increasing for (verified numerically), this together shows that is minimized at the intersection point.

We compute the intersection point numerically as . For we verify (using WolframAlpha ), that the maximum is attained at and that and . Thus, we have established the optimality of even though we have not computed for and . (In the mathoverflow question , this value is verified analytically – notice that there is slightly different giving a different .)

For implementation reasons we want to be a multiple of . Therefore, we propose  – a choice which is only slightly smaller than the optimal value and confirmed experimentally (Figure 5). Again for this fixed , we verify that indeed the maximum is at and that and , see Figure 4. Thus, up to the small difference , we know that is optimal.

For this fixed value of we have thus computed , which in turn gives us a bound on .

###### Theorem 3.6:

MoMQuickMergesort with undersampling factor () runs in time and performs at most comparisons.

### 3.4 Heuristic estimate of the average case

It is hard to calculate an exact average case since at none but the first stage during the execution of the algorithm we are dealing with random inputs. We still estimate the average case by assuming that all intermediate arrays are random and applying some more heuristic arguments.

#### Average of the median-of-medians algorithm.

On average we can expect that the pivot returned from the median-of-medians procedure is very close to an actual median, which gives us an easy recurrence showing that . However, we have to take adaptive pivot selection into account. The first pivot is the -th element with very high probability. Thus, the recursive call is on elements with (or  – by symmetry we assume the first case). Due to adaptive pivot selection, the array will be also split in a left part of size (with the element we are looking for in it – this is guaranteed even in the worst case) and a larger right part. This is because an order element of the pseudomedians of fifteen is also an order elements of the whole array. Thus, all successive recursive calls will be made on arrays of size . We denote the average number of comparisons of the median-of-median algorithm recursing on an array of size as .

We also have to take the recursive calls for pivot selection into account. The first pivot is the median of the sample; thus, the same reasoning as for applies. The second pivot is an element of order out of elements – so we are in the situation of . Thus, we get

 Tav,MoM(n) =TnoRecav,MoM(n2)+Tav,MoM(n9)+20n9 and TnoRecav,MoM(n) =TnoRecav,MoM(n9)+20n9+o(n).

Hence, by Lemma 2.1, we obtain and

 Tav,MoM(n) =Tav,MoM(n/9)+20n9+5n4+o(n) =12532n+o(n)≤4n+o(n).

#### Average of MoMQuickMergesort.

As for the median-of-medians algorithm, we can expect that the pivot in MoMQuickMergesort is always very close to the median. Using the bound for the adaptive version of the median-of-medians algorithm, we obtain

 \allowdisplaybreaksTav,MQMSθ(n) =Tav,MQMSθ(n/2)+n2log(n/2)−1.24n2+2215θn+415θn+15θ−115θn+o(n) ≤nlogn+n⋅(−1.24+103θ)+o(n).

by Lemma 2.1 (here the is for the average case of the median-of-medians algorithm, the other terms as before). This yields

 Tav,MQMS(n)≤nlogn+2.094n+o(n)

(for ). For our proposed we have

 Tav,MQMS11/5(n)≤nlogn+0.275n+o(n).

### 3.5 Hybrid algorithms

In order to achieve an even better average case, we can apply a trick similar to Introsort . Be aware, however, that this deteriorates the worst case slightly. We fix some small . The algorithms starts by executing QuickMergesort with median of three pivot selection. Whenever the pivot is contained in the interval , the next pivot is selected again as median of three, otherwise according to Section 3.3 (as median of pseudomedians of elements) – for the following pivots it switches back to median of 3. When choosing not too small, the worst case number of comparisons will be only approximately more than of MoMQuickMergesort with undersampling (because in the worst case before every partitioning step according to MoMQuickMergesort with undersampling, there will be one partitioning step with median-of-3 using comparisons), while the average is almost as QuickMergesort with median-of-3. We use . We call this algorithm hybrid QuickMergesort (HQMS).

Another possibility for a hybrid algorithm is to use MoMQuickMergesort (with undersampling) instead of Heapsort as a worst-case stopper for Introsort. We test both variants in our experiments.

### 3.6 Summary of algorithms

For the reader’s convenience we provide a short summary of the different versions of MoMQuickMergesort and the results we obtained in Table 1.

## 4 Experiments

#### Experimental setup.

We ran thorough experiments with implementations in C++ with different kinds of input permutations. The experiments are run on an Intel Core i5-2500K CPU (3.30GHz, 4 cores, 32KB L1 instruction and data cache, 256KB L2 cache per core and 6MB L3 shared cache) with 16GB RAM and operating system Ubuntu Linux 64bit version 14.04.4. We used GNU’s g++ (4.8.4); optimized with flags -O3 -march=native. For time measurements, we used std::chrono::high_resolution_clock, for generating random inputs, the Mersenne Twister pseudo-random generator std::mt19937. All time measurements were repeated with the same 100 deterministically chosen seeds – the displayed numbers are the averages of these 100 runs. Moreover, for each time measurement, at least 128MB of data were sorted – if the array size is smaller, then for this time measurement several arrays have been sorted and the total elapsed time measured. If not specified explicitly, all experiments were conducted with 32-bit integers.

#### Implementation details.

The code of our implementation of MoMQuickMergesort as well as the other algorithms and our running time experiments is available at https://github.com/weissan/QuickXsort. In our implementation of MoMQuickMergesort, we use the merging procedure from , which avoids branch mispredictions. We use the partitioner from the libstdc++ implementation of std::sort. For the running time experiments, base cases up to 42 elements are sorted with Insertionsort. For the comparison measurements Mergesort is used down to size one arrays.

#### Simulation of a worst case.

In order to experimentally confirm our worst case bounds for MoMQuickMergesort, we simulate a worst case. Be aware that it is not even clear whether in reality there are input permutations where the bounds for the worst case of Section 3 are tight since when selecting pivots the array is already pre-sorted in a particular way (which is hard to understand for a thorough analysis). Actually in  it is conjectured that similar bounds for different variants of the median-of-medians algorithm are not tight. Therefore, we cannot test the worst-case by designing particularly bad inputs. Nevertheless, we can simulate a worst-case scenario where every pivot is chosen the worst way possible (according to the theoretical analysis). More precisely, the simulation of the worst case comprises the following aspects:

• For computing the -th element of a small array (up to 30 elements) we additionally sort it with Heapsort. This is because our implementation uses Introselect (std::nth_element) for arrays of size up to 30.

• When measuring comparisons, we perform a random shuffle before every call to Mergesort. As the average case of Mergesort is close to its worst case (up to approximately ), this gives a fairly well approximation of the worst case. For measuring time we apply some simplified shuffling method, which shuffles only few positions in the array.

• In the median-of-medians algorithm, we do not use the pivot selected by recursive calls, but use std::nth_element to find the worst pivot the recursive procedure could possibly select. We do not count comparisons incurred by std::nth_element. This is the main contribution to the worst case.

• As pivot for QuickMergesort (the basic and improved variant) we always use the real median (this is actually the worst possibility as the recursive call of QuickMergesort is guaranteed to be on the smaller half and Mergesort is not slower than QuickMergesort). In the version with undersampling we use the most extreme pivot (since this is worse than the median).

• We also make 100 measurements for each data point. When counting comparisons, we take the maximum over all runs instead of the mean. However, this makes only a negligible difference (as the small standard deviation in Table 2 suggests). When measuring running times we still take the mean since the maximum reflects only the large standard deviation of Quickselect (std::nth_element), which we use to find bad pivots.

The simulated worst cases are always drawn as dashed lines in the plots (except in Figure 5).

#### Different undersampling factors.

In Figure 5, we compare the (simulated) worst-case number of comparisons for different undersampling factors . The picture resembles the one in Figure 3. However, all numbers are around smaller than in Figure 3 because we used the average case of Mergesort to simulate its worst case. Also, depending on the array size , the point where the two curves for and meet differs ( as in Section 3.3). Still the minimum is always achieved between 2.1 and 2.3 (recall that we have to take the maximum of the two curves for the same ) – confirming the calculations in Section 3.3 and suggesting as a good choice for further experiments.