How to answer a small batch of RMQs or LCA queries in practice

# How to answer a small batch of RMQs or LCA queries in practice

## Abstract

In the Range Minimum Query (RMQ) problem, we are given an array of numbers and we are asked to answer queries of the following type: for indices and between and , query returns the index of a minimum element in the subarray . Answering a small batch of RMQs is a core computational task in many real-world applications, in particular due to the connection with the Lowest Common Ancestor (LCA) problem. With small batch, we mean that the number of queries is and we have them all at hand. It is therefore not relevant to build an -sized data structure or spend time to build a more succinct one. It is well-known, among practitioners and elsewhere, that these data structures for online querying carry high constants in their pre-processing and querying time. We would thus like to answer this batch efficiently in practice. With efficiently in practice, we mean that we (ultimately) want to spend time and space. We write to stress that the number of operations per entry of should be a very small constant. Here we show how existing algorithms can be easily modified to satisfy these conditions. The presented experimental results highlight the practicality of this new scheme. The most significant improvement obtained is for answering a small batch of LCA queries. A library implementation of the presented algorithms is made available.

## 1Introduction

In the Range Minimum Query (RMQ) problem, we are given an array of numbers and we are asked to answer queries of the following type: for indices and between and , query returns the index of a minimum element in the subarray .

The RMQ problem and the linearly equivalent Lowest Common Ancestor (LCA) problem [4] are very well-studied and several optimal algorithms exist to solve them. It was first shown by Harel and Tarjan [14] that a tree can be pre-processed in time so that LCA queries can be answered in time per query. A major breakthrough in practicable constant-time LCA-computation was made by Berkman and Vishkin [6]. Farach and Bender [4] further simplified this algorithm by showing that the RMQ problem is linearly equivalent to the LCA problem (shown also in [10]). The constants due to the reduction, however, remained quite large, making these algorithms impractical in most realistic cases. To this end, Fischer and Heun [9] presented yet another optimal, but also direct, algorithm for the RMQ problem. The same authors (but also others [15]) showed that due to large constants in the pre-processing and querying time implementations of this algorithm are often slower than implementations of the naive ones. Continuous efforts for engineering these solutions are being made [8].

In this article we try to address this problem, in particular when one wants to answer a relatively small batch of RMQs efficiently. This version of the problem is a core computational task in many real-world applications such as in object inheritance during static compilation of code [5] or in several string matching problems (see Section 5 for some). With small batch, we mean that the number of the queries is and we have them all at hand. It is therefore not relevant to build an -sized data structure or spend time to build a more succinct one. It is well-known, among practitioners and elsewhere, that these data structures carry high constants in both their pre-processing and querying time. (Note that when one can use these data structures for this computation.) We would thus like to answer this batch efficiently in practice. With efficiently in practice, we mean that we (ultimately) want to spend time and space. We write to stress that the number of operations per entry of should be a very small constant; e.g. scan the array once or twice. In what follows, we show how existing algorithms can be easily modified to satisfy these conditions. Experimental results presented here highlight the practicality of this scheme. The most significant improvement obtained is for answering a small batch of LCA queries. The RMQ Batch problem can be defined as follows.

The LCA Queries Batch problem can be defined as follows.

We assume the word-RAM model with word size . For the RMQ Batch problem, we assume that we are given a rewritable array of size , each entry of which may be increased by and still fit in a computer word. For the LCA Queries Batch problem, we assume that we are given (an -sized representation of) a rewritable tree allowing constant-time access to (at least) the nodes of that are in some query in (see the representation in [12], for instance). All presented algorithms are deterministic.

## 2Contracting the Input Array

Consider any two adjacent array entries and . Observe that if no query in starts or ends at or at then, if , will never be the answer to any of the queries in . Hence, the idea is that we want to contract array , so that each block that does not contain the left or right endpoint of any query gets replaced by one element: its minimum. A similar idea, based on sorting the list , has been considered in the External Memory model [1] (see also [2]). In this section, we present a solution for our computational model, which avoids using space or time, but also avoids using time.

There are some technical details in order to update the queries for into queries for the new array using only time and extra space. We first scan the array once and find . We also create two auxiliary arrays and . For each query we mark positions (and ) in the array as follows. If , then has not been marked before. Let this be the -th position, , that gets marked (we just store a counter for that). We store in and replace the value that is stored in by . We also start a linked list at , where we insert a pointer to query , so that we can update it later. If , then the position has already been marked; we just add a pointer to the respective query in the linked list starting at .

We then scan array again and create a new array as follows: for each marked position (i.e. ), we copy the original value (i.e. ) in , while each maximal block in that does not contain a marked position is replaced by a single entry—its minimum. When we insert the original entry of a marked position of (i.e. ) in at position , we go through the linked list that is stored in , where we have stored pointers to all the queries of the form or , and replace by in each of them. Thus, after we have scanned , for each query on , we will have stored the respective pair on . Note that we need to scan array only once if we know a priori (e.g. in LCP array [7]), or twice otherwise.

While creating , we also store in an auxiliary array the function between positions of and the respective original positions in .

Now notice that and the auxiliary arrays are all of size since in the worst case we mark distinct elements of and contract blocks that do not contain a marked position. (We can actually throw away everything before the first marked position and everything after the last marked position and get instead.) The whole procedure takes time and space. Note that if then .

We can finally retrieve the original input array if required by replacing by for every in the domain of in time.

## 3Small RMQ Batch

### 3.1An n+O(qlogq)-time and O(q)-space Algorithm

The algorithm presented in this section is a modification of the Sparse Table algorithm by Bender and Farach-Colton [4] applied on array ; we denote it by . The modification is based on the fact that (i) we do not want to consume extra space to answer the queries; and (ii) we do not want to necessarily do all the pre-processing work of the algorithm in [4], which is designed to answer any of the possible queries online. We denote this modified algorithm by and formalise it below.

The idea is to first put each with in a bucket based on the for which —we can have at most such buckets. In this process, if we find queries of the form , we answer them on the spot. We can do this in time.

We then create an array of size where we will store -tuples . In Step , will store the minimum value across , as well as the position , where it occurs. We initialise it as and we will then update it by utilising the doubling technique. At Step we answer all (trivial) queries that are stored in ; they are of the form and the answer can be found by looking at —note that we compare elements of lexicographically. When we are done with we have to update by setting for all .

Generally, in Step , we answer the queries of as follows. For query , we find the answer by obtaining . We then return . The point is that . When we are done with we set if .

We do this until we have gone through all non-empty buckets (i.e. ). Updating takes time in each step, and we need in total time for the queries. We thus need time for this part of the algorithm. Since , this time is . The overall time complexity of the algorithm is thus . Notably, the space required is only as we overwrite in each step.

### 3.2n+O(q)-time and O(q)-space Algorithms

Offline-based algorithm. Given an array of numbers its Cartesian tree is defined as follows. The root of the Cartesian tree is , its left subtree is computed recursively on and its right subtree on . An LCA instance can be obtained from an RMQ instance on an array by letting be the Cartesian tree of that can be constructed in time [10]. It is easy to see that in translates to in . The first step of this algorithm is to create array in time similarly to algorithm . The second step is to construct the Cartesian tree of in time and extra space. Finally, we apply the offline algorithm by Gabow and Tarjan [11] to answer queries in time and extra space. This takes overall time and extra space. We denote this algorithm by . We denote by the same algorithm applied on array .

Online-based algorithm. The first step of this algorithm is to create array in time similarly to algorithm . We can then apply the algorithm by Fischer and Heun [9] on array to obtain overall an -time and -space algorithm. We denote this algorithm by . We denote by the same algorithm applied on array .

Note that in the case when , i.e. the batch is not so small, we can choose to apply algorithm or algorithm on array directly thus obtaining an algorithm that always works in time and extra space. We therefore obtain the following result asymptotically.

## 4Small LCA Queries Batch

In the LCA problem, we are given a rooted tree having labelled nodes and we are asked to answer queries of the following type: for nodes and , query returns the node furthest from the root that is an ancestor of both and . There exists a time-optimal algorithm by Gabow and Tarjan [11] to answer a batch of LCA queries in time and extra space. We denote this algorithm by . In this section, we present a simple but non-trivial algorithm for improving this, for , to time and extra space.

It is well-known (see [4] for the details) that an RMQ instance can be obtained from an LCA instance on a tree by writing down the depths of the nodes visited during an Euler tour of . That is, is obtained by listing all node-visitations in a depth-first search (DFS) traversal of starting from the root. The LCA of two nodes translates to an RMQ (where we compare nodes based on their level) between the first occurrences of these nodes in .

We proceed largely as in Section 2. For each query , we mark nodes (and ) in as follows. If then has not been marked before. Let this be the -th node, , that gets marked (we just store a counter for that). We also create two arrays and . We store in and replace by . We also start a linked list at , where we insert a pointer to query , so that we can update it later. If , the node has already been marked, and we just add a pointer to the respective query in the linked list starting at .

We then do a single DFS traversal on and create two new arrays and as follows. When a marked node (i.e. ) is visited for the first time, we write down in its original value (i.e. ), while for each maximal sequence of visited nodes that are not marked we write down a single entry—the one with the minimum tree level. At the same time, we store in the level of the node added in . While creating , we also store in an auxiliary array the function between positions of and the respective node labels in .

When we insert the original entry of a marked node of (i.e. ) in at position , we go through the linked list that is stored in , where we have stored pointers to all the queries of the form or , and replace by in each of these queries. Thus, after we have finished the traversal on , for each LCA query on , we will have stored the respective RMQ pair on ; where (resp. ) corresponds to the first occurrence of node (resp. ) in the traversal. Thus we traverse only once.

Now notice that and the auxiliary arrays are all of size since in the worst case we mark distinct nodes of and contract sequences of visited nodes that do not contain a marked node. (We can actually throw away everything before the first marked node and everything after the last marked node and get instead.) The whole procedure takes time and space. We are now in a position to apply algorithm on to obtain the final bound. To answer the queries, note that if then . We denote this algorithm by . Alternatively, we can apply algorithm on to solve this problem in and extra space; we denote this algorithm by .

We can finally retrieve the original input tree if required by replacing node by for every in the domain of in time.

Note that in the case when , i.e. the batch is not so small, we can choose to apply algorithm on tree directly, thus obtaining an algorithm that always works in time and extra space. We therefore obtain the following result asymptotically.

## 5Applications

We consider the well-known application of answering LCA queries on the suffix tree of a string. The suffix tree of a non-empty string of length is a compact trie representing all suffixes of (see [7], for details). The nodes of the trie which become nodes of the suffix tree are called explicit nodes, while the other nodes are called implicit. Each edge of the suffix tree can be viewed as an upward maximal path of implicit nodes starting with an explicit node. Moreover, each node belongs to a unique path of that kind. Then, each node of the trie can be represented in the suffix tree by the edge it belongs to and an index within the corresponding path. The path-label of a node is the concatenation of the edge labels along the path from the root to . The nodes whose path-label corresponds to a suffix of are called terminal. Given two terminal nodes and in , representing suffixes and , the string depth of node corresponds to the length of their longest common prefix, also known as their longest common extension (LCE) [15].

In many textbook solutions for classical string matching problems (e.g. maximal palindromic factors, approximate string matching with -mismatches, approximate string matching with -differences, online string search with the suffix array, etc.) we have that and/or the queries have to be answered online. In other algorithms, however, can be much smaller on average (in practice) and the queries can be answered offline. We describe here a few such solutions. The common idea, as in many fast average-case algorithms, is to minimise the number of queries by filtering out queries that can never lead to a valid solution.

Text indexing. Suppose we are given the suffix tree of a text of length and we are asked to create the suffix links for the internal nodes. This may be necessary if the construction algorithm does not compute suffix links (e.g. construction via suffix array) but they are needed for an application of interest. The suffix link of a node with path-label is a pointer to the node path-labelled , where is a single letter and is a string. The suffix link of exists if is a non-root internal node of . The suffix links can be computed as follows. The first step is to mark each internal node of the suffix tree with a pair of leaves such that leaves labelled and are in subtrees rooted at different children of . This can be done by a DFS traversal of the tree. (Note that if an internal node has only one child then it must be terminal; assume that it represents the suffix . We thus create a suffix link to the node representing .) Given an internal node marked with , note that , and let be its path-label. To create the suffix link from , node with path-label can be obtained by the query . We can create a batch of LCA queries consisting of all such pairs. Note that in randomly generated texts, the number of internal nodes of is on average, where is the alphabet’s entropy [20]; thus the standard -time and -space solution to this problem, building the LCA data structure over [4], is not satisfactory.

Finding frequent gapped factors in texts. We are given a text of length , and positive integers , , , and . The problem is to find all couples , such that string , for any string (known as gap or spacer), , occurs in at least times, , [16]. The first step is to build . We then locate all subtrees rooted at an explicit node with string depth at least and whose parent has string depth less than , corresponding to factors repeated in . From these subtrees, we only consider the ones with at least terminal nodes. Note that if is large enough, we may have only a few such subtrees. For each subtree with terminal nodes, representing suffixes , we create a batch of LCA queries between all pairs and report occurrences when LCA queries extend pairwise matches to length at least for a set of at least such suffixes. (This algorithm can be easily generalised for any number of gaps.)

Pattern matching on weighted sequences. A weighted sequence specifies the probability of occurrence of each letter of the alphabet for every position. A weighted sequence thus represents many different strings, each with the probability of occurrence equal to the product of probabilities of its letters at subsequent positions of the weighted sequence. The problem is to find all occurrences of a (standard) pattern of length with probability at least in a weighted sequence of length [17]. The first step is to construct the heavy string of , denoted by , by assigning to the most probable letter of (resolving ties arbitrarily). The second step is to build , . We can then compute the first mismatch between and every substring of . Note that the number of positions in where two or more letters occur with probability at least can be small, and so we consider only these positions to cause a legitimate mismatch between and a factor of . We then use batches of LCA queries per such starting position to extend a match to length at least . This is because cannot match a weighted sequence with probability if more than mismatches occur between and [17].

Pattern matching with don’t care letters. We are given a pattern of length , with letters from alphabet and occurrences of a don’t care letter (matching itself and any letter from ), and a text of length . The problem is to find all occurrences of in [18]. The first step is to build , , where is the string obtained from by replacing don’t care letters with a letter . We then locate the subtree rooted at the highest explicit node corresponding to the longest factor of without ’s. We also locate, in the same subtree, all terminal nodes corresponding to starting positions of in . Note that if is long enough, we may have only a few such nodes. Since we know where the don’t care letters occur in , we can create a batch of LCA queries. An occurrence is then reported when LCA queries extend a match to length at least . (This algorithm can be easily generalised for any number of patterns.)

Circular string matching. We are given a pattern of length and a text of length . The problem is to find all occurrences of or any of its cyclic shifts in [3]. The first step is to build , where , and denotes the reverse image of string . We then conceptually split in two fragments of lengths and . Any cyclic shift of contains as a factor at least one of the two fragments. We thus locate the two subtrees rooted at the highest explicit nodes corresponding to the fragments. We also locate in the same subtrees all terminal nodes corresponding to starting positions of the fragments in . Note that if is long enough, we may have only a few such nodes. We create a batch of at most LCA queries in order to extend to the left and to the right and report occurrences when LCA queries extend a match to length at least . (This algorithm can be easily generalised for any number of patterns.)

## 6Experimental Results

We have implemented algorithms , , and in the C++ programming language. We have also implemented the same algorithms applied on the original array , denoted by , , and , respectively; as well as the brute-force algorithm for answering RMQs in the two corresponding flavours, denoted by and . For the implementation of and we used the sdsl-lite library [13]. If an algorithm requires time and extra space, we say that the algorithm has complexity . Table 1 summarises the implemented algorithms. The following experiments were conducted on a Desktop PC using one core of Intel Core i5-4690 CPU at 3.50GHz and 16GB of RAM. All programs were compiled with g++ version 5.4.0 at optimisation level 3 (-O3).

Experiment I. We generated random (uniform distribution) input arrays of and entries (integers), and random (uniform distribution) lists of queries of sizes varying from to , doubling each time. We compared the runtime of the implementations of the algorithms in Table 1 on these inputs; in particular, for each algorithm, we compared the standard implementation against the one with the contracted array. We used the large array, , for and because they are significantly faster and the small one, , for and . The results plotted in Figure ? show that the proposed scheme of contracting the input array improves the performance for all implementations substantially.

Experiment II. We generated random input arrays of entries, and random lists of queries of sizes varying from to , doubling each time. We then compared the runtime of and on these inputs. The results are plotted in Figure ?. We observe that becomes two times faster than as grows. Notably, it was not possible to run this experiment with , which implements a succinct data structure for answering RMQs, due to insufficient amount of main memory.

Experiment III. In addition, we have implemented algorithms and for answering LCA queries. We first generated a random input array of entries and used this array to compute its Cartesian tree. Next we generated random lists of LCA queries of sizes varying from to , doubling each time. We then compared the runtime of and on these inputs. The results plotted in Figure ? show that the implementation of is more than two orders of magnitude faster than the implementation of , highlighting the impact of the proposed scheme on LCA queries.

## 7Final Remarks

In this article, we presented a new family of algorithms for answering a small batch of RMQs or LCA queries in practice. The main purpose was to show that if the number of queries is small with respect to and we have them all at hand existing algorithms for RMQs and LCA queries can be easily modified to perform in time and extra space. The presented experimental results indeed show that with this new scheme significant practical improvements can be obtained; in particular, for answering a small batch of LCA queries.

Specifically, algorithms and , our modifications to the Sparse Table algorithm whose main catch is space [4], seem to be the best way to answer in practice a small batch of RMQs and LCA queries, respectively. A library implementation of is available at https://github.com/solonas13/rmqo under the GNU General Public License.

### References

1. I/O-efficient range minima queries.
P. Afshani and N. Sitchinava. In SWAT 2014, volume 8503 of LNCS, pages 1–12. Springer, 2014.
2. On (dynamic) range minimum queries in external memory.
L. Arge, J. Fischer, P. Sanders, and N. Sitchinava. In WADS 2013, volume 8037 of LNCS, pages 37–48. Springer, 2013.
3. Fast circular dictionary-matching algorithm.
T. Athar, C. Barton, W. Bland, J. Gao, C. S. Iliopoulos, C. Liu, and S. P. Pissis. Mathematical Structures in Computer Science, 27(2):143–156, 2017.
4. The LCA problem revisited.
M. A. Bender and M. Farach-Colton. In LATIN 2000, volume 1776 of LNCS, pages 88–94. Springer-Verlag, 2000.
5. Lowest common ancestors in trees and directed acyclic graphs.
M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Journal of Algorithms, 57(2):75–94, 2005.
6. Recursive star-tree parallel data structure.
O. Berkman and U. Vishkin. SIAM J. Comput., 22(2):221–242, 1993.
7. Algorithms on strings.
M. Crochemore, C. Hancart, and T. Lecroq. Cambridge University Press, 2007.
8. Improved range minimum queries.
H. Ferrada and G. Navarro. J. Discrete Algorithms, 43:72–80, 2016.
9. Theoretical and practical improvements on the rmq-problem, with applications to lca and lce.
J. Fischer and V. Heun. In CPM 2006, volume 4009 of LNCS, pages 36–48. Springer Berlin Heidelberg, 2006.
10. Scaling and related techniques for geometry problems.
H. N. Gabow, J. L. Bentley, and R. E. Tarjan. In STOC 1984, pages 135–143. ACM, 1984.
11. A linear-time algorithm for a special case of disjoint set union.
H. N. Gabow and R. E. Tarjan. Journal of Computer and System Sciences, 30(2):209–221, 1985.
12. A simple optimal representation for balanced parentheses.
R. F. Geary, N. Rahman, R. Raman, and V. Raman. Theor. Comput. Sci., 368(3):231–246, 2006.
13. From theory to practice: Plug and play with succinct data structures.
S. Gog, T. Beller, A. Moffat, and M. Petri. In SEA, volume 8504 of LNCS, pages 326–337, 2014.
14. Fast algorithms for finding nearest common ancestors.
D. Harel and R. E. Tarjan. SIAM J. Comput., 13(2):338–355, 1984.
15. The longest common extension problem revisited and applications to approximate string searching.
L. Ilie, G. Navarro, and L. Tinta. J. Discrete Algorithms, 8(4):418–428, 2010.
16. A first approach to finding common motifs with gaps.
C. Iliopoulos, J. Mchugh, P. Peterlongo, N. Pisanti, W. Rytter, and M.-F. Sagot. International Journal of Foundations of Computer Science, 16(6):1145–1155, 2005.
17. Pattern Matching and Consensus Problems on Weighted Sequences and Profiles.
T. Kociumaka, S. P. Pissis, and J. Radoszewski. In ISAAC 2016, volume 64 of LIPIcs, pages 46:1–46:12. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016.
18. Efficient string matching with don’t-care patterns.
R. Y. Pinter. In Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 11–29. Springer Berlin Heidelberg, 1985.
19. MoTeX-II: structured motif extraction from large-scale datasets.
S. P. Pissis. BMC Bioinformatics, 15:235, 2014.
20. New results on the size of tries.
M. Régnier and P. Jacquet. IEEE Trans. Information Theory, 35(1):203–205, 1989.
10067