The Euler Path to Static LevelAncestors
Abstract
Suppose that a rooted tree is given for preprocessing. The levelancestor problem is to answer quickly queries of the following form. Given a vertex and an integer , find the th vertex on the path from the root to . Algorithms that achieve a linear time bound for preprocessing and a constant time bound for a query have been published by Dietz (1991), Alstrup and Holm (2000), and Bender and Farach (2002). The first two algorithms address dynamic versions of the problem; the last addresses the static version only and is the simplest so far. The purpose of this note is to expose another simple algorithm, derived from a complicated PRAM algorithm by Berkman and Vishkin (1990,1994). We further show some easy extensions of its functionality, adding queries for descendants and level successors as well as ancestors, extensions for which the formerly known algorithms are less suitable.
Keywords: algorithms, data structures, trees.
1 Introduction
The levelancestor problem is defined as follows. Suppose that a rooted tree is given for preprocessing. Answer quickly queries of the following form. Given a vertex and an integer , find an ancestor of in whose level is , where the level of the root is 0.
Two related tree queries are: Level Successor—given , find the next vertex (in preorder) on the same level. Level Descendant—given and , find the first descendant of on level (if one exists).
The levelancestor problem is a relative of the betterknown LCA (Least Common Ancestor) problem. In their seminal paper on LCA problems [12], Harel and Tarjan solve the level ancestor problem on certain special trees as a subroutine of an LCA algorithm. An application of the Level Ancestor problem is mentioned already in [1], although an implementation of this data structure had not yet been published at the time.
The first published algorithms for the level ancestor problem were a PRAM algorithm by Berkman and Vishkin [6, 7], and a serial (RAM) algorithm by Dietz [8] that accommodates dynamic updates. Alstrup and Holm [3] gave an algorithm that solves an extended dynamic problem, and has the additional advantage that its staticonly version is simpler than the previous algorithms. Finally, the simplest algorithm—for the static problem only—was given by Bender and Farach [5].
It is curious that very complicated algorithms to address theoretical challenges, namely dynamization and parallelization, had been published for this problem earlier than any simple algorithm for the most basic and useful variant (static, on serial RAM). It is also curious that the essential ideas for such an algorithm do appear in Berkman and Vishkin’s solution but this potential contribution was missed, since they concentrated on the PRAM problem, for which they gave a notoriously impractical algorithm (involving a tableof almost entries). The first goal of this paper is to rectify this situation by presenting a sequential algorithm based on the approach of Berkman and Vishkin. This is not done just for historical interest, but because the algorithm here presented is simply useful: it is efficient and easy to implement (and has been implemented). Furthermore, we shall present a few useful extensions that were either unsupported by previous work, or supported in much more complicated ways. Specifically, we show how to accommodate level successor and level descendant queries, in addition to level ancestor. Together, these two queries are useful for iterating over the descendants of a vertex at a given level. For example applications of the extension, see [14, 15].
Technical remarks.
Since we only consider data structures that support time queries, we refer to the algorithms by the preprocessing cost. That is, an time algorithm means lineartime preprocessing. The data to the algorithm is a tree whose precise representation is of little consequence (since standard representations are interchangeable within linear time). We assume that vertices are identified by numbers 0 through .
2 The Euler Tour and the FindSmaller problem
Like the betterknown LCA algorithm that also originates from [6], this Level Ancestor algorithm is based on the following key ideas:

The Euler Tour representation of a tree reduces the problem to a problem on a linear array.

A data structure with preprocessing time (and size) is given for this problem.
The microset technique is also used in other work on level ancestors [3, 5, 13, 11] but they all apply at least part of the processing to the tree, using various methods of decomposition into subtrees. Here, all processing is applied to the Eulertour array.
Consider a tree , rooted at some vertex . For each edge in , add its antiparallel edge . This results in a directed graph . Since the indegree and outdegree of each vertex of are the same, has an Euler tour that starts and ends in the root of . Note that the tour consists of arcs, hence vertices including the endpoints.
By a straightforward application of DFS on we can compute the following information:

An array such that is the th vertex on the Euler tour.

An array such that is the level of the th vertex on the Euler tour.

An array such that is the index of the last occurrence of in the array , called the representative of .
Observation 1
Let . Vertex is the level ancestor of vertex if and only if is the first vertex after the last occurrence of in the Euler tour such that .
By this observation, the computation of the arrays , and reduces the levelancestor problem to the following
FINDSMALLER (FS) Problem.
Input for preprocessing: Array of integers
Query: Let and . A query seeks the minimal such that . If no such exists, the answer is 0.
Our goal is to preprocess the array so that each query can be processed in time.
The Euler tour implies that the difference between successive elements of array is exactly one. Therefore, for our goal, it suffices to solve the following restricted problem:
FS is the FindSmaller problem restricted to arrays where for all , .
We remark that the general Find Smaller problem cannot be solved with query time, if one requires a polynomialspace data structure, and assumes a polylogarithmic word length; the reason is that the static predecessor problem, for which nonconstant lower bounds are known [4], can be easily reduced to it.
Another preparatory definition is the following. Let be a power of two and consider a balanced binary tree of nodes numbered through in symmetric order (thus, 1 is the leftmost leaf and the rightmost). The height of node is , the position of the rightmost nonzero bit in the binary representation of , counting from 0. We denote by the least common ancestor of nodes and . For the algorithms, we assume that is computed in constant time. In fact, it can be computed using standard machine instructions and the MSB (most significant set bit) function; this function is implemented as an instruction in many processors, but could also be provided by a precomputed table. Following is a useful property of the ) function.
Lemma 2
If are two nodes of the complete binary tree, and , then , and
We omit the easy proof. Finally, for uniformity of notation, we define for to be 0.
3 Basic constanttimequery algorithm
In this section we describe an time preprocessing algorithm for the FS problem. Throughout this section and the sequel, we make the simplifying assumption that is a power of two.
Our description of the Basic algorithm has two steps. (1) The output of the preprocessing algorithm is specified, and it is shown how to process an FS query in constant time using this output. (2) The preprocessing algorithm is described. This order helps motivating the presentation.
3.1 Data structure and query processing
For each , , the preprocessing algorithm constructs an array , where and for , . In we store the answer to .
A query is processed as follows
(we assume that , for otherwise the answer is immediate,
due to the restriction).
(1) If , return .
(2) Let .
If return .
(3) Otherwise, let ; return .
Figure 1 demonstrates the structure for a 16element array , except that all the arrays are truncated to 8 elements. In this example, the query is answered immediately as ; the query is answered via Case (3): and .
We now explain the algorithm. Correctness of Case (2) is obvious by the definition of the structure. The correctness in Case (3) hinges on two claims. The first, Claim 3 below, shows that the reference to is within bounds; the second, Claim 4, shows that the answer found there is the right one.
Claim 3
In Case (3), we have .
Proof. For the first inequality: by its definition; we are dealing with FS, therefore . For the second inequality: We assume , as for and the claim clearly holds. Consider the complete binary tree of nodes, used to define . The algorithm sets , so by Lemma 2,
Since the difference between consecutive elements is , we have , so we conclude that
Claim 4
If , then .
Proof. Because we are dealing with FS, the values are all in the interval . By assumption we have . Thus, the answer to lies beyond , and is also the answer to .
3.2 The preprocessing algorithm
It is easy to verify that the size of the data structure is . To construct it in time, we perform a sweep from right to left; that is, for we compute an array where is the index of the first such that (or 0 by default). Note that this is not the same as . Initializing for is trivial and that updating it when is decremented is constanttime. For each , is just a copy of an appropriate section of . This completes the preprocessing.
4 Improved constanttimequery algorithm
In this section we describe an time algorithm, based on the solution of the former section together with the microset technique. The essence of the technique is to fix a block length and to sparsify the structure of the last section by using it only on block boundaries, reducing its cost to , while for intrablock queries we use an additional data structure, the microstructure. For presentation’s sake, we now provide a specification of the microstructure and go on to describe the rest of the structure. The implementation of the microstructure will be dealt with in the following section.
For working with blocks, without resorting to numerous division operators, we shall write down some numbers (specifically, array indices) in a quotientandremainder notation, , where it is tacitly assumed that .
The MicroStructure.
This data structure is assumed to support in time the following query: —return the answer to provided that it is less than . Otherwise, return 0.
The FS Structure.
For each , , our preprocessing algorithm now constructs two arrays:

A near array such that stores the answer to (namely, the first entries of of the previous section).

A far array such that
Thus, the arrays are not only sparsified, but also (for the far arrays) are their values truncated. Referring to the example in Figure 1, we have , so near arrays have 4 elements, e.g., . The far array has entries: .
The following fact follows from the restriction and the definition of :
Observation 5
If , then .
Query processing.
A query is processed as follows (we assume once again that ).

If , return .

If then

If return

;
if then

; return .
else

; ; return .



(if )
;
if , return , else return .
The following observations justify this procedure, and also show that there is no real recursion here: the recursive calls can actually be implemented as gotos and they never loop.

For Case (2.2.2), we can show, as for the basic algorithm, that (same proof as before), and that , showing that the recursive call falls back to Case (2.2.1). The last inequality is proved as Claim 6.
Claim 6
In Case (2.2.2), we have .
5 The Micro Structure
The purpose of the micro structure is to support “close” queries, i.e., return the answer to provided that it is at most . There are several ways to implement this structure, with subtle differences in performance or ease of implementation. We describe two.
5.1 Berkman and Vishkin’s structure
The basis for fast solution of inblock queries in [7] is observing that, up to normalization, there are less than different possible blocks. Normalization amounts to subtracting the first element of the block from all elements; i.e., moving the “origin” to zero. Clearly, a query on any array , , is equivalent to where is the normalized form of . The bound follows from the restriction. This also allows us to conveniently represent a block as a binary string of length (which fits in a word). We obtain the following solution.
Preprocessing: For every possible “small” array of size , beginning with 0, and satisfying the restriction, build a matrix such that is the answer to for every and . As an identifier of (to index the array of matrices) we use the bit representation of . While preprocessing an array of size for FS queries, we store for every the identifier of the block .
Query: is answered by looking up (returning 0 if the second index is out of range).
Complexity: The query is obviously constanttime. For the preprocessing, creating the idntifier array clearly takes time. The construction of a single matrix can be done quite simply in time, and altogether we get time and space.
5.2 A solution after Alstrup, Gavoille, Kaplan and Rauhe
Another implementation of the micro structure is suggested by an idea from [2]. In its basic form, as we next describe, it is really independent of the division into blocks—except that it only supports queries where the answer is close enough to the query index.
For , let
From the property, one can easily deduce that is precisely the position of the th 1 in the sequence
The solution to the microstructure problem, based on this observation, follows:
Preprocessing: For every , compute and store in an array entry the bit mask .
Query: is answered (for ) by looking up the ’th set bit in . The answer is 0 if there is no such bit.
This query returns answers in positions up to , rather than , which can possibly result in a faster query. As an additional advantage, can be enlarged up to the word size, saving both time and space (there is a certain caveat—see below).
Query Complexity: The query is constanttime if we have a constanttime implementation of the function that locates the ’th bit set in a word. In the absence of hardware support, a precomputed table, of size , can be used (but this requires limiting the value of as before)^{1}^{1}1Another way, which is not constanttime in the RAM model, is to search for this bit using available arithmetic/logical instructions. Since this is a tight loop without any memory access, it may be even faster than a table access on a real computer..
Preprocessing Algorithm: To compute the mask array , we scan from right to left while maintaining two pieces of data: the mask corresponding to the current position , and a stack that includes the indices up to the end of . Each time the current position is decremented, is pushed unto the stack, possibly kicking off the top two elements (specifically, if ). The current mask is easily adjusted in time.
Clearly, the computation of takes time, and this is also the space required. Fischer and Heun [9] propose to apply this technique within microblocks; in other words, revert to the BerkmanVishkin approach of maintaining a table indexed by the block identifier, but keep the mask table instead of an explicit answer matrix. This saves a factor of in the size of the micro structure, but is likely to be competitive in speed only if the bitfinding operation we make use of is supported by hardware.
5.3 Saving memory
In our description of the algorithm we aimed for simplicity while achieving the desired asymptotic bounds: constanttime query together with space and preprocessing time. If, for some practical reason, the constant in the space bound is of importance, one can look for improvements, which are not hard to find. We list two simple constantfactor improvements.

The size of can be defined to be instead of . Moreover, assuming that all (as is the case when using FS to solve Level Ancestors), we can use . This eliminates , and may give additional savings further on, depending on the shape of the tree in the Level Ancestor problem.

The size of the arrays , in the reduction of Level Ancestors to the FindSmaller problem can be cut in half by listing a vertex in only when visited by the Euler Tour for the last time (put otherwise, we list the vertices in postorder). It is still true that the level ancestor of is the first vertex occurring after such that . Thus, the reduction to Find Smaller is still correct. However, now the FS problem that results does not enjoy the property. But it has a similar property: for all , . Interestingly, this suffices for implementing the algorithm, at least with the microstructure of Section 5.2. Thus, this saving in memory incurs no loss in running time.
Remark.
This part of the solution is where the simplification with respect to [7] is most significant, although the outline (initial, nonoptimal, solution, and usage of microblocks) is similar.
6 The LevelDescendant and LevelSuccessor Queries
Observation 1 can easily be turned from ancestors to descendants:
Observation 7
Let . Vertex is the first level descendant of vertex if and only if is the first vertex after the first occurrence of in the Euler tour such that , provided that this vertex is a descendant of . If it is not, has no level descendant.
By this observation, the level descendant query reduces to a FindGreater problem, analogous to FindSmaller and solved in the same way, plus a test of descendance. Thus, to add this functionality, we use the same arrays , and add a vector maintaining the first occurrence of each vertex in the tour. We also need a search structure for “Find Greater.” This structure is, of course, completely symmetric to the FindSmaller structure so no further explanation should be necessary (incidentally, the micro table àla BerkmanVishkin can be shared). Testing for descendance is easy— descends from if and only if .
The level successor query is handled similarly, by the following observation:
Observation 8
Vertex is the level successor of vertex if and only if is the first vertex after the last occurrence of in the Euler tour such that .
7 Conclusion
I described how to construct and query a data structure for answering Level Ancestor queries on trees. The algorithm is based on Berkman and Vishkin’s Euler Tour technique and is, in essence, a simplification of their PRAM algorithm. In contrast to the original, this version of the algorithm is simple and practical. The algorithm was implemented in C by Victor Buchnik; the code can be obtained from Amir BenAmram.
Another advantage of this algorithm is that it can be easily extended to support queries for Level Descendants and Level Successors.
References
 [1] N. Alon and B. Schieber. Optimal preprocessing for answering online product queries. Technical report, Tel Aviv University, 1987.
 [2] S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a new algorithm for a distributed environment. Theory of Computing Systems, 37(3):441–456, 2004.
 [3] S. Alstrup and J. Holm. Improved algorithms for finding level ancestors in dynamic trees. In U. Montanari, J. D. P. Rolim, and E. Welzl, editors, Proceedings of the 27th International Colloquium on Automata, Languages and Programming (ICALP), volume 1853 of LNCS, pages 73–84. SpringerVerlag, July 2000.
 [4] P. Beame and F. E. Fich. Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci, 65(1):38–72, 2002.
 [5] M. A. Bender and M. FarachColton. The level ancestor problem simplified. Theor. Comput. Sci, 321(1):5–12, 2004.
 [6] O. Berkman and U. Vishkin. Recursive *tree parallel datastructure. In 30th Annual Symposium on Foundations of Computer Science, pages 196–202. IEEE, 1989.
 [7] O. Berkman and U. Vishkin. Finding levelancestors in trees. J. Computer and System Sciences, 48(2):214–230, 1994.
 [8] P. F. Dietz. Finding levelancestors in dynamic trees. In Workshop on Algorithms and Data Structures (WADS), pages 32–40, 1991.
 [9] J. Fischer and V. Heun. A new succinct representation of rmqinformation and improvements in the enhanced suffix array. In Proceedings of the International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies (ESCAPE’07), volume ?? of Lecture Notes in Computer Science, pages ??–?? Springer, 2007.
 [10] Gabow, Harold N. and Tarjan, Robert E. A linear time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 30:209–221, 1985.
 [11] R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with levelancestor queries. In J. I. Munro, editor, Proceedings of the Fifteenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 1114, 2004, pages 1–10. SIAM, 2004.
 [12] D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338–355, May 1984.
 [13] J. I. Munro and S. S. Rao. Succinct representations of functions. In J. Díaz, J. Karhumäki, A. Lepistö, and D. Sannella, editors, Automata, Languages and Programming: 31st International Colloquium, ICALP 2004, Turku, Finland, July 1216, 2004. Proceedings, volume 3142 of Lecture Notes in Computer Science, pages 1006–1015. Springer, 2004.
 [14] H. Yuan and M. J. Atallah. Efficient distributed thirdparty data authentication for tree hierarchies. In 28th IEEE International Conference on Distributed Computing Systems (ICDCS ’08), pages 184–193. IEEE Computer Society, 2008.
 [15] H. Yuan and M. J. Atallah. Efficient data structures for rangeaggregate queries on trees. In R. Fagin, editor, Database Theory—ICDT 2009, 12th International Conference, St. Petersburg, Russia, Proceedings, volume 361 of ACM International Conference Proceeding Series, pages 111–120. ACM, 2009.