QuickXsort: Efficient Sorting with Comparisons on Average
Abstract
In this paper we generalize the idea of QuickHeapsort leading to the notion of QuickXsort. Given some external sorting algorithm X, QuickXsort yields an internal sorting algorithm if X satisfies certain natural conditions.
With QuickWeakHeapsort and QuickMergesort we present two examples for the QuickXsortconstruction. Both are efficient algorithms that incur approximately comparisons on the average. A worst case of comparisons can be achieved without significantly affecting the average case.
Furthermore, we describe an implementation of MergeInsertion for small . Taking MergeInsertion as a base case for QuickMergesort, we establish a worstcase efficient sorting algorithm calling for comparisons on average. QuickMergesort with constant size base cases shows the best performance on practical inputs: when sorting integers it is slower by only 15% to STLIntrosort.
1 Introduction
Sorting a sequence of elements remains one of the most frequent tasks carried out by computers. A lower bound for sorting by only pairwise comparisons is comparisons for the worst and average case (logarithms referred to by are base 2, the average case refers to a uniform distribution of all input permutations assuming all elements are different).
Sorting algorithms that are optimal in the leading term are called constantfactoroptimal. Table 1 lists some milestones in the race for reducing the coefficient in the linear term. One of the most efficient (in terms of number of comparisons) constantfactoroptimal algorithms for solving the sorting problem is Ford and Johnson’s MergeInsertion algorithm [7]. It requires comparisons in the worst case [10]. MergeInsertion has a severe drawback that makes it uninteresting for practical issues: similar to Insertionsort the number of element moves is quadratic in . With Insertionsort we mean the algorithm that inserts all elements successively into the already ordered sequence finding the position for each element by binary search (not by linear search as mostly done). However, MergeInsertion and Insertionsort can be used to sort small subarrays such that the quadratic running time for these subarrays is small in comparison to the overall running time.
Reinhardt [12] used this technique to design an internal Mergesort variant that needs in the worst case comparisons. Unfortunately, implementations of this InPlaceMergesort algorithm have not been documented. Katajainen et al.’s [9, 6] work inspired by Reinhardt is practical, but the number of comparisons is larger.
Throughout the text we avoid the terms inplace or insitu and prefer the term internal (opposed to external). We call an algorithm internal if it needs at most space in addition to the array to be sorted. That means we consider Quicksort as an internal algorithm whereas standard Mergesort is external because it needs a linear amount of extra space.
Based on QuickHeapsort [1], in this paper we develop the concept of QuickXsort and apply it to other sorting algorithms as Mergesort or WeakHeapsort. This yields efficient internal sorting algorithms. The idea is very simple: as in Quicksort the array is partitioned into the elements greater and less than some pivot element. Then one part of the array is sorted by some algorithm X and the other part is sorted recursively. The advantage of this procedure is that, if X is an external algorithm, then in QuickXsort the part of the array which is not currently being sorted may be used as temporary space, what yields an internal variant of X. We show that under natural assumptions QuickXsort performs up to terms on average the same number of comparisons as X.
Mem.  Other  Worst  Avg.  Exper.  
Lower bound  1.44  1.44  
BottomUpHeapsort [13]  –  [0.35,0.39]  
WeakHeapsort [3, 5]  0.09  –  [0.46,0.42]  
RelaxedWeakHeapsort [4]  0.91  0.91  0.91  
Mergesort [10]  0.91  1.26  –  
ExternalWeakHeapsort #  0.91  1.26*  –  
Insertionsort [10]  0.91  1.38 #  –  
MergeInsertion [10]  1.32  1.3999 #  [1.43,1.41]  
InPlaceMergesort [12]  1.32  –  –  
QuickHeapsort [1, 2]  0.03  0.20  
0.99  1.24  
QuickMergesort (IS) #  0.32  1.38  –  
QuickMergesort #  0.32  1.26  [1.29,1.27]  
QuickMergesort (MI) #  0.32  1.3999  [1.41,1.40] 
Abbreviations: # in this paper, MI MergeInsertion, – not analyzed, * for , : computer word width in bits; we assume .
For QuickXsort we assume InPlaceMergesort as a worstcase stopper (without ).
The concept of QuickXsort (without calling it like that) was first applied in UltimateHeapsort by Katajainen [8]. In UltimateHeapsort, first the median of the array is determined, and then the array is partitioned into subarrays of equal size. Finding the median means significant additional effort. Cantone and Cincotti [1] weakened the requirement for the pivot and designed QuickHeapsort which uses only a sample of smaller size to select the pivot for partitioning. UltimateHeapsort is inferior to QuickHeapsort in terms of average case running time, although, unlike QuickHeapsort, it allows an bound for the worst case number of comparisons. Diekert and Weiß [2] analyzed QuickHeapsort more thoroughly and showed that it needs less than comparisons in the average case when implemented with approximately elements as sample for pivot selection and some other improvements.
Edelkamp and Stiegeler [4] applied the idea of QuickXsort to WeakHeapsort (which was first described by Dutton [3]) introducing QuickWeakHeapsort. The worst case number of comparisons of WeakHeapsort is , and, following Edelkamp and Wegener [5], this bound is tight. In [4] an improved variant with comparisons in the worst case and requiring extra space is presented. With ExternalWeakHeapsort we propose a further refinement with the same worst case bound, but in average requiring approximately comparisons. Using ExternalWeakHeapsort as X in QuickXsort we obtain an improvement over QuickWeakHeapsort of [4].
As indicated above, Mergesort is another good candidate to apply the QuickXsortconstruction. With QuickMergesort we describe an internal variant of Mergesort which not only in terms of number of comparisons is almost as good as Mergesort, but also in terms of running time. As mentioned before, MergeInsertion can be used to sort small subarrays. We study MergeInsertion and provide an implementation based on weak heaps. Furthermore, we give an average case analysis. When sorting small subarrays with MergeInsertion, we can show that the average number of comparisons performed by Mergesort is bounded by , and, therefore, QuickMergesort uses at most comparisons in the average case.
2 QuickXsort
In this section we give a more precise description of QuickXsort and derive some results concerning the number of comparisons performed in the average and worst case. Let X be some sorting algorithm. QuickXsort works as follows: First, choose some pivot element as median of some random sample. Next, partition the array according to this pivot element, i. e., rearrange the array such that all elements left of the pivot are less or equal and all elements on the right are greater or equal than the pivot element. Then, choose one part of the array and sort it with algorithm X. (In general, it does not matter whether the smaller or larger half of the array is chosen. However, for a specific sorting algorithm X like Heapsort, there might be a better and a worse choice.) After one part of the array has been sorted with X, move the pivot element to its correct position (right after/before the already sorted part) and sort the other part of the array recursively with QuickXsort.
The main advantage of this procedure is that the part of the array that is not being sorted currently can be used as temporary memory for the algorithm X. This yields fast internal variants for various external sorting algorithms (such as Mergesort). The idea is that whenever a data element should be moved to the external storage, instead it is swapped with some data element in the part of the array which is not currently being sorted. Of course, this works only, if the algorithm needs additional storage only for data elements. Furthermore, the algorithm has to be able to keep track of the positions of elements which have been swapped. As the specific method depends on the algorithm X, we give some more details when we describe the examples for QuickXsort.
For the number of comparisons we can derive some general results which hold for a wide class of algorithms X. Under natural assumptions the average case number of comparisons of X and of QuickXsort differs only by an term. For the rest of the paper, we assume that the pivot is selected as the median of approximately randomly chosen elements. Sample sizes of approximately are likely to be optimal as the results in [2, 11] suggest.
Theorem 1 (QuickXsort AverageCase).
Let X be some sorting algorithm requiring at most comparisons in the average case. Then, QuickXsort implemented with elements as sample for pivot selection is a sorting algorithm that also needs at most comparisons in the average case.
For the proofs we assume that the arrays are indexed starting with . The following lemma is crucial for our estimates. It can be derived by applying Chernoff bounds or by direct elementary calculations.
Lemma 1 ([2, Lm. 2]).
Let . If we choose the pivot as median of elements such that , then we have where .
Proof of Thm. 1.
Let denote the average number of comparisons performed by QuickXsort on an input array of length and let with be an upper bound for the average number of comparisons performed by the algorithm X on an input array of length . Without loss of generality we may assume that is monotone. We are going to show by induction that
for some monotonically increasing with which we will specify later.
Let with , i. e., is some function tending slowly to zero for . Because of , we see that tends to zero if . Hence, by Lem. 1 it follows that the probability that the pivot is more than off the median tends to zero for . In the following we write and . We obtain the following recurrence relation:
The function , has its only minimum in the interval at , i. e., for it decreases monotonically and for it increases monotonically. We set . That means that we have for and for . Using this observation, the induction hypothesis, and our assumptions, we conclude
With as above we obtain:
(1) 
We subtract on both sides and then divide by . Let be some constant such that for all (which exists since for all and for ). Then, we obtain
where the last inequality follows from for . We see that if satisfies the inequality
We choose as small as possible. Inductively, we can show that for every there is some such that . Hence, the theorem follows. ∎
Does QuickXsort provide a good bound for the worst case? The obvious answer is “no”. If always the smallest elements are chosen for pivot selection, a running time of is obtained. However, we can prove that such a worst case is very unlikely. In fact, let be the worst case number of comparisons of the algorithm X. Prop. 1 states that the probability that QuickXsort needs more than comparisons decreases exponentially in . (This bound is not tight, but since we do not aim for exact probabilities, Prop. 1 is enough for us.)
Proposition 1.
Let . The probability that QuickXsort needs more than comparisons is less than for large enough.
Proof.
Let be the size of the input. We say that we are in a good case if an array of size is partitioned in the interval , i. e., if the pivot is chosen in that interval. We can obtain a bound for the desired probability by estimating the probability that we always are in such a good case until the array contains only elements. For smaller arrays, we can assume an upper bound of comparisons for the worst case. For all partitioning steps that sums up to less than comparisons if we are always in a good case. We also have to consider the number of comparisons required to find the pivot element. At any stage the pivot is chosen as median of at most elements. Since the median can be determined in linear time, for all stages together this sums up to less than comparisons if we are always in a good case and is large enough. Finally, for all the sorting phases with X we need at most comparisons in total (that is only a rough upper bound which can be improved as in the proof of Thm. 1). Hence, we need at most comparisons if always a good case occurs.
Now, we only have to estimate the probability that always a good case occurs. By Lem. 1, the probability for a good case in the first partitioning step is at least for some constant . We have to choose times a pivot in the interval , then the array has size less than . We only have to consider partitioning steps where the array has size greater than (if the size of the array is already less than we define the probability of a good case as ). Hence, for each of these partitioning steps we obtain that the probability for a good case is greater than . Therefore, we obtain
by Bernoulli’s inequality. For large enough we have . ∎
To obtain a provable bound for the worst case complexity we apply a simple trick. We fix some worst case efficient sorting algorithm Y. This might be, e. g., InPlaceMergesort. Worst case efficient means that we have a bound for the worst case number of comparisons. We choose some slowly decreasing function , e. g., . Now, whenever the pivot is more than off the median, we switch to the algorithm Y. We call this QuickXYsort. To achieve a good worst case bound, of course, we also need a good bound for algorithm X. W. l. o. g. we assume the same worst case bounds for X as for Y. Note that QuickXYsort only makes sense if one needs a provably good worst case bound. Since QuickXsort is always expected to make at most as many comparisons as QuickXYsort (under the reasonable assumption that X on average is faster than Y – otherwise one would use simply Y), in every step of the recursion QuickXsort is the better choice for the average case.
In order to obtain an efficient internal sorting algorithm, of course, Y has to be internal and X using at most extra spaces for an array of size .
Theorem 2 (QuickXYsort WorstCase).
Let X be a sorting algorithm with at most comparisons in the average case and comparisons in the worst case (). Let Y be a sorting algorithm with at most comparisons in the worst case. Then, QuickXYsort is a sorting algorithm that performs at most comparisons in the average case and comparisons in the worst case.
Proof.
Since the proof is very similar to the proof of Thm. 1, we provide only a sketch. By replacing by with in the right side of (2) in the proof of Thm. 1 we obtain for the average case:
As in Thm. 1 the statement for the average case follows.
For the worst case, there are two possibilities: either the algorithm already fails the condition in the first partitioning step or it does not. In the first case, it is immediate that we have a worst case bound of , which also is tight. Note that we assume that we can choose the pivot element in time which is no real restriction, since the median of elements can be found in time. In the second case, we assume by induction that for for some and obtain a recurrence relation similar to (2) in the proof of Thm. 1:
By the same arguments as above the result follows. ∎
3 QuickWeakHeapsort
In this section consider QuickWeakHeapsort as a first example of QuickXsort. We start by introducing weak heaps and then continue by describing WeakHeapsort and a novel external version of it. This external version is a good candidate for QuickXsort and yields an efficient sorting algorithm that uses approximately comparisons (this value is only a rough estimate and neither a bound from below nor above). A drawback of WeakHeapsort and its variants is that they require one extra bit per element. The exposition also serves as an intermediate step towards our implementation of MergeInsertion, where the weakheap data structure will be used as a building block.
Conceptually, a weak heap (see Fig. 1) is a binary tree satisfying the following conditions:

The root of the entire tree has no left child.

Except for the root, the nodes that have at most one child are in the last two levels only. Leaves at the last level can be scattered, i. e., the last level is not necessarily filled from left to right.

Each node stores an element that is smaller than or equal to every element stored in its right subtree.
From the first two properties we deduce that the height of a weak heap that has elements is . The third property is called the weakheap ordering or halftree ordering. In particular, this property enforces no relation between an element in a node and those stored its left subtree. On the other hand, it implies that any node together with its right subtree forms a weak heap on its own. In an arraybased implementation, besides the element array , an array of reverse bits is used, i. e., for . The root has index . The array index of the left child of is , the array index of the right child is , and the array index of the parent is (assuming that ). Using the fact that the indices of the left and right children of are exchanged when flipping , subtrees can be reversed in constant time by setting . The distinguished ancestor () of for , is recursively defined as the parent of if is a right child, and the distinguished ancestor of the parent of if is a left child. The distinguished ancestor of is the first element on the path from to the root which is known to be smaller or equal than by (3). Moreover, any subtree rooted by , together with the distinguished ancestor of , forms again a weak heap with root by considering as right child of .
The basic operation for creating a weak heap is the operation which combines two weak heaps into one. Let and be two nodes in a weak heap such that is smaller than or equal to every element in the left subtree of . Conceptually, and its right subtree form a weak heap, while and the left subtree of form another weak heap. (Note that is not allowed be in the subtree with root .) The result of is a weak heap with root at position . If , the two elements are swapped and is flipped. As a result, the new element will be smaller than or equal to every element in its right subtree, and the new element will be smaller than or equal to every element in the subtree rooted at . To sum up, requires constant time and involves one element comparison and a possible element swap in order to combine two weak heaps to a new one.
The construction of a weak heap consisting of elements requires comparisons. In the standard bottomup construction of a weak heap the nodes are visited one by one. Starting with the last node in the array and moving to the front, the two weak heaps rooted at a node and its distinguished ancestor are joined. The amortized cost to get from a node to its distinguished ancestor is [5].
When using weak heaps for sorting, the minimum is removed and the weak heap condition restored until the weak heap becomes empty. After extracting an element from the root, first the special path from the root is traversed topdown, and then, in a bottomup process the weakheap property is restored using at most join operations. (The special path is established by going once to the right and then to the left as far as it is possible.) Hence, extracting the minimum requires at most comparisons.
Now, we introduce a modification to the standard procedure described by Dutton [3], which has a slightly improved performance, but requires extra space. We call this modified algorithm ExternalWeakHeapsort. This is because it needs an extra output array, where the elements which are extracted from the weak heap are moved to. On average ExternalWeakHeapsort requires less comparisons than RelaxedWeakHeapsort [4]. Integrated in QuickXsort we can implement it without extra space other than the extra bits and some other extra bits. We introduce an additional array active and weaken the requirements of a weak heap: we also allow nodes on other than the last two levels to have less than two children. Nodes where the active bit is set to false are considered to have been removed. ExternalWeakHeapsort works as follows: First, a usual weak heap is constructed using comparisons. Then, until the weak heap becomes empty, the root – which is the minimal element – is moved to the output array and the resulting hole has to be filled with the minimum of the remaining elements (so far the only difference to normal WeakHeapsort is that there is a separate output area).
The hole is filled by searching the special path from the root to a node which has no left child. Note that the nodes on the special path are exactly the nodes having the root as distinguished ancestor. Finding the special path does not need any comparisons, since one only has to follow the reverse bits. Next, the element of the node is moved to the root leaving a hole. If has a right subtree (i. e., if is the root of a weak heap with more than one element), this hole is filled by applying the holefilling algorithm recursively to the weak heap with root . Otherwise, the active bit of is set to false. Now, the root of the whole weak heap together with the subtree rooted by forms a weak heap. However, it remains to restore the weak heap condition for the whole weak heap. Except for the root and , all nodes on the special path together with their right subtrees form weak heaps. Following the special path upwards these weak heaps are joined with their distinguished ancestor as during the weak heap construction (i. e., successively they are joined with the weak heap consisting of the root and the already treated nodes on the special path together with their subtrees). Once, all the weak heaps on the special path are joined, the whole array forms a weak heap again.
Theorem 3.
For ExternalWeakHeapsort performs exactly the same comparisons as Mergesort applied on a fixed permutation of the same input array.
Proof.
First, recall the Mergesort algorithm: The left half and the right half of the array are sorted recursively and then the two subarrays are merged together by always comparing the smallest elements of both arrays and moving the smaller one to the separate output area. Now, we move to WeakHeapsort. Consider the tree as it is initialized with all reverse bits set to false. Let be the root and its only child (not the elements but the positions in the tree). We call together with the left subtree of the left part of the tree and we call together with its right subtree the right part of the tree. That means the left part and the right part form weak heaps on their own. The only time an element is moved from the right to the left part or viceversa is when the data elements and are exchanged. However, always one of the data elements of and comes from the right part and one from the left part. After extracting the minimum , it is replaced by the smallest remaining element of the part came from. Then, the new and are compared again and so on. Hence, for extracting the elements in sorted order from the weak heap the following happens. First, the smallest elements of the left and right part are determined, then they are compared and finally the smaller one is moved to the output area. This procedure repeats until the weak heap is empty. This is exactly how the recursion of Mergesort works: always the smallest elements of the left and right part are compared and the smaller one is moved to the output area. If , then the left and right parts for Mergesort and WeakHeapsort have the same sizes. ∎
By [10, 5.2.4–13] we obtain the following corollary.
Corollary 1 (Average Case ExternalWeakHeapsort).
For the algorithm ExternalWeakHeapsort uses approximately comparisons in the average case.
If is not a power of two, the sizes of left and right parts of WeakHeapsort are less balanced than the left and right parts of ordinary Mergesort and one can expect a slightly higher number of comparisons. For QuickWeakHeapsort, the half of the array which is not sorted by ExternalWeakHeapsort is used as output area. Whenever the root is moved to the output area, the element that occupied that place before is inserted as a dummy element at the position where the active bit is set to false. Applying Thm. 1, we obtain the rough estimate of comparisons for the average case of QuickWeakHeapsort.
4 QuickMergesort
As another example for QuickXsort we consider QuickMergesort. For the Mergesort part we use standard (topdown) Mergesort which can be implemented using extra spaces to merge two arrays of length (there are other methods like in [12] which require less space – but for our purposes this is good enough). The procedure is depicted in Fig. 2. We sort the larger half of the partitioned array with Mergesort as long as we have one third of the whole array as temporary memory left, otherwise we sort the smaller part with Mergesort.
Hence, the part which is not sorted by Mergesort always provides enough temporary space. When a data element should be moved to or from the temporary space, it is swapped with the element occupying the respective position. Since Mergesort moves through the data from left to right, it is always known which are the elements to be sorted and which are the dummy elements. Depending on the implementation the extra space needed is words for the recursion stack of Mergesort. By avoiding recursion this can even be reduced to . Thm. 1 together with [10, 5.2.4–13] yields the next result.
Theorem 4 (Average Case QuickMergesort).
QuickMergesort is an internal sorting algorithm that performs at most comparisons on average.
We can do even better if we sort small subarrays with another algorithm Z requiring less comparisons but extra space and more moves, e. g., Insertionsort or MergeInsertion. If we use elements for the base case of Mergesort, we have to call Z at most times. In this case we can allow additional operations of Z like moves in the order of , given that .
Note that for the next theorem we only need that the size of the base cases grows as grows. Nevertheless, is the largest growing value we can choose if we apply a base case algorithm with moves and want to achieve an overall running time.
Theorem 5 (QuickMergesort with Base Case).
Let be some sorting algorithm with comparisons on the average and other operations taking at most time. If base cases of size are sorted with Z, QuickMergesort uses at most comparisons and other instructions on the average.
Proof.
By Thm. 1 and the preceding remark, the only thing we have to prove is that Mergesort with base case Z requires on average at most comparisons, given that Z needs comparisons on average. The latter means that for every we have for large enough.
Let denote the average case number of comparisons of Mergesort with base cases of size sorted with Z and let . Since grows as grows, we have that for large enough and . For we have and by induction we see that . Hence, also for large enough.
∎
Using Insertionsort we obtain the following result. Here, denotes the natural logarithm. As we did not find a result in literature, we also provide a proof. Recall that Insertionsort inserts the elements one by one into the already sorted sequence by binary search.
Proposition 2 (Average Case of Insertionsort).
The sorting algorithm Insertionsort needs comparisons on the average where .
Corollary 2 (QuickMergesort with Base Case Insertionsort).
If we use as base case Insertionsort, QuickMergesort uses at most comparisons and other instructions on the average.
Proof of Prop. 2.
First, we take a look at the average number of comparisons to insert one element into a sorted array of elements by binary insertion.
To insert a new element into elements either needs or comparisons. There are positions where the element to be inserted can end up, each of which is equally likely. For of these positions comparisons are needed. For the other positions comparisons are needed. This means
comparisons are needed on average. By [10, 5.3.1–(3)], we obtain for the average case for sorting elements:
We examine the last sum separately. In the following we write for the harmonic sum with the Euler constant.
Hence, we have
In order to obtain a numeric bound for , we compute and then replace by . This yields a function
which oscillates between and for . For , its value is . ∎
Bases cases of growing size, always lead to a constant factor overhead in running time if an algorithm with a quadratic number of total operations is applied. Therefore, in the experiments we will also consider constant size base cases which offer a slightly worse bound for the number of comparisons, but are faster in practice. We do not analyze them separately, since the preferred choice for the size depends on the type of data to be sorted and the system on which the algorithms run.
5 MergeInsertion
MergeInsertion by Ford and Johnson [7] is one of the best sorting algorithms in terms of number of comparisons. Hence, it can be applied for sorting base cases of QuickMergesort what yields even better results than Insertionsort. Therefore, we want to give a brief description of the algorithm and our implementation. While the description is simple, MergeInsertion is not easy to implement efficiently. Our implementation is based on weak heaps and uses extra bits. Algorithmically, MergeInsertion can be described as follows (an intuitive example for can be found in [10]).

Arrange the input such that for with one comparison per pair. Let and for , and if is odd.

Sort the values recursively with MergeInsertion.

Rename the solution as follows: and insert the elements via binary insertion, following the ordering , , , , , , , , into the main chain, where .
Due to the different renamings, the recursion, and the change of link structure, the design of an efficient implementation is not immediate. Our proposed implementation of MergeInsertion is based on a tournament tree representation with weak heaps as in Sect. 3. The pseudocode implementations for all the operations to construct a tournament tree with a weak heap and to access the partners in each round are shown in Fig. 2 in the appendix. (Note that for simplicity in the above formulation the indices and the order are reversed compared to our implementation.)
One main subroutine of MergeInsertion is binary insertion. The call  inserts the element at position between position and by binary insertion. (The pseudocode implementations for the binary search routine is shown in Fig. 3 in the appendix.) In this routine we do not move the data elements themselves, but we use an additional index array to point to the elements contained in the weak heap tournament tree and move these indirect addresses. This approach has the advantage that the relations stored in the tournament tree are preserved.
The most important procedure for MergeInsertion is the organization of the calls for . After adapting the addresses for the elements (w. r. t. the above description) in the second part of the array, the algorithm calls the binary insertion routine with appropriate indices. Note that we always use comparisons for all elements of the th block (i. e., the elements ) even if there might be the chance to save one comparison. By introducing an additional array, which for each contains the current index of , we can exploit the observation that not always comparisons are needed to insert an element of the th block. In the following we call this the improved variant. The pseudocode of the basic variant is shown in Fig. 1. The last sequence is not complete and is thus tackled in a special case.
Theorem 6 (Average Case of MergeInsertion).
The sorting algorithm MergeInsertion needs comparisons on the average, where .
Corollary 3 (QuickMergesort with Base Case MergeInsertion).
When using MergeInsertion as base case, QuickMergesort needs at most comparisons and other instructions on the average.
Proof of Thm. 6.
According to Knuth [10], MergeInsertion requires at most comparisons in the worst case, where . In the following we want to analyze the average savings relative to the worst case. Therefore, let denote the average number of comparisons of the insertion steps of MergeInsertion, i. e., all comparisons minus the efforts for the weak heap construction, which always takes place. Then, we obtain the recurrence relation
with such that and some . As we do not analyze the improved version of the algorithm, the insertion of elements with index less or equal requires always the same number of comparisons. Thus, the term is independent of the data. However, inserting an element after may either need or comparisons. This is where comes from. Note that only depends on . We split into with
and  
and  
and  
For the average case analysis, we have that is independent of the data. For we have , and hence, . Since otherwise is nonnegative, this proves that exactly for the average case matches the worst case.
Now, we have to estimate for arbitrary . We have to consider the calls to binary insertion more closely. To insert a new element into an array of elements either needs or comparisons. For a moment assume that the element is inserted at every position with the same probability. Under this assumption the analysis in the proof of Prop. 2 is valid, which states that
comparisons are needed on average.
The problem is that in our case the probability at which position an element is inserted is not uniformly distributed. However, it is monotonically increasing with the index in the array (indices as in our implementation). Informally speaking, this is because if an element is inserted further to the right, then for the following elements there are more possibilities to be inserted than if the element is inserted on the left.
Now,  can be implemented such that for an odd number of positions the next comparison is made such that the larger half of the array is the one containing the positions with lower probabilities. (In our case, this is the part with the lower indices – see Fig. 3.) That means the less probable positions lie on rather longer paths in the search tree, and hence, the average path length is better than in the uniform case. Therefore, we may assume a uniform distribution in the following as an upper bound.
In each of the recursion steps we have calls to binary insertion into sets of size elements each. Hence, for inserting one element, the difference to the worst case is . Summing up, we obtain for the average savings w. r. t. the worst case number the recurrence
For we write with and we set
Recall that we have . Thus, and coincide for most and differ by at most 1 for a few values where is close to or . Since in both cases is smaller than some constant, this implies that and differ by at most a constant. Furthermore, and differ by at most a constant. Hence, we have:
Since we have , this resolves to
With this means up to terms