A Static Optimality Transformation with Applications to Planar Point Location
Over the last decade, there have been several data structures that, given a planar subdivision and a probability distribution over the plane, provide a way for answering point location queries that is fine-tuned for the distribution. All these methods suffer from the requirement that the query distribution must be known in advance.
We present a new data structure for point location queries in planar triangulations. Our structure is asymptotically as fast as the optimal structures, but it requires no prior information about the queries. This is a 2-d analogue of the jump from Knuth’s optimum binary search trees (discovered in 1971) to the splay trees of Sleator and Tarjan in 1985. While the former need to know the query distribution, the latter are statically optimal. This means that we can adapt to the query sequence and achieve the same asymptotic performance as an optimum static structure, without needing any additional information.
=10000 = 10000
We consider the problem of finding a statically optimal data structure for planar point location in triangulations. This problem and related problems have a long history that goes back to the dawn of computer science. Thus, before giving a formal description of the problem and of our results, let us first provide some background on the history and motivation behind our work.
Comparison-based predecessor search constitutes one of the oldest problems in computer science: given a set from a totally ordered universe , we would like to construct a data structure for answering predecessor queries. In such a query, we are given an element , and we need to return the largest with (or , if no such exists). In the most general decision-tree model, we are allowed to evaluate in each step an arbitrary function on , where the choice of may depend on the outcomes of the previous evaluations. The classic solution sorts during preprocessing and answers queries in steps through binary search, where denotes the size of . Information theoretic arguments imply that any such comparison-based algorithm requires steps in the worst case (see, e.g., Ailon et al.  for more details).
However, the story does not end here. Early in the history of computer science, researchers realized that if the distribution of query outcomes is sufficiently biased, expected-time query processing becomes possible. This insight led to the invention of optimal search trees. These are specialized data structures for the case that the query outcomes are drawn independently from a known fixed distribution, and a wide literature studying their variants and extensions have been developed . In this context, optimality is characterized by the entropy of the distribution: if denotes the probability of the th outcome, the entropy is defined as . Information theory  shows that is a lower bound for the expected number of steps that any comparison-based algorithm needs to answer a predecessor query, assuming that the searches are drawn independently from a fixed distribution (e.g., ).
All the above results require that the distribution, or a suitable approximation thereof, be known in advance. This situation changed in 1985, when Sleator and Tarjan  introduced splay trees. These trees have many amazing properties, not the least of which is called static optimality. This means that for any sufficiently long query sequence, splay trees are asymptotically as fast as optimal static search trees. For this, splay trees require no prior information on the query distribution.
Planar point location is a fundamental problem in computational geometry. A triangulation is a partition of the plane into (possibly infinite) triangles. Given , we need to construct a data structure for point location queries: given a point , return the triangle of that contains it. Again, we use a decision-tree model. This means that in each step we may evaluate an arbitrary function on , where may depend on the previous comparisons.
There are several point location structures with query time, which is optimal in our decision-tree model. These structures are notable not only for achieving optimality, but for doing so through very different methods, such as planar separators , Kirkpatrick’s successive refinement approach , persistence , layered DAGs , or randomized incremental construction .
Once again, it makes sense to consider biased query distributions. For a known fixed distribution of point location queries, there are several data structures that achieve optimal expected query time, assuming independence. These biased structures are analogous to optimal search trees. Thus, we can use the same information theoretic arguments to characterize the optimal expected query time by the entropy of the probabilities of the queried regions .
A series of papers by Arya et al.  converge on two algorithms. The first one achieves query time with space, while the second, simpler, algorithm supports queries in time and space.
1.3Creating a point location structure that is statically optimal
In view of the developments for binary search trees, one question presents itself: Is there a point location structure that is asymptotically as fast as the biased structures, without explicit knowledge of the query distribution? Or, put differently, can a point location structure achieve a running time similar to the static optimality bound of splay trees? This open problem, which we resolve here, explicitly appears in several previous works on point location, e.g., in Arya et al. :
Taking this in a different direction, suppose that the query distribution is not known at all. That is, the probabilities that the query point lies within the various cells of the subdivision are unknown. In the 1-dimensional case it is known that there exist self-adjusting data structures, such as splay trees, that achieve good expected query time in the limit. Do such self-adjusting structures exist for planar point location?
There are several possible approaches towards statically optimal point location. One, suggested above, would be to create some sort of self-adjusting point location structure and to analyze it in a way similar to splay trees. This has not been done; we suspect that the main stumbling block is that all known efficient structures are comparison DAGs : they can be represented as a directed acyclic graph with a unique source and out-degree , such that each node corresponds to a planar region. A point location query proceeds by starting at the source and by following in each step an edge that is determined by comparing the query point with a fixed line. The query continues until it reaches a sink, whose corresponding region constitutes the desired query outcome. In order to achieve reasonable space usage, it seems essential to use a DAG instead of a simple tree. Unfortunately, we do not know how to perform rotation-like local changes in such DAGs that would mimic the behavior of splay trees.
Another possible avenue is to use splay trees in an existing structure. Goodrich et al.  followed this approach, using essentially a hybrid of splay trees and the persistent line-sweep method. Unfortunately, their method does not give a result optimal with respect to the entropy of the original distribution of query outcomes, but rather to the entropy of the probabilities of querying regions of a strip decomposition of the triangulation. The latter is obtained by drawing vertical lines through every point of the triangulation. This strip decomposition could split a high-probability triangle into several parts and could potentially increase the entropy of the query result by , the worst possible; see Figure 1 for an example.
One might also try to create a structure with the working set property. This property, originally used in the analysis of splay trees, states that the processing time for a query is logarithmic in the number of distinct queries since the last query that returned the same result as . The working set property implies static optimality ; it has also proved useful in several other contexts . Most importantly, there is a general transformation from a dynamic time structure into one with the working set property . Unfortunately, even though several dynamic data structures for predecessor searching are known (e.g., AVL trees  or red-black trees ), it remains a prominent open problem to develop a point location structure that supports insertions, deletions, and queries in time. (Note that  claims to modify Kirkpatrick’s method to allow for time insertions, deletions, and queries. The claimed result is wrong.
Our solution to the problem of statically optimal point location is very simple: we take a biased structure that needs to be initialized with distributional information, and we rebuild it periodically using the observed frequencies for each region. We do not store all the regions in the biased structure—this would make the rebuilding step too expensive. Instead, upon rebuilding we create a structure storing only the most frequent items observed so far, where is some suitable constant. We resort to a static time structure to complete queries for the remaining regions. The rebuilding takes place after every queries, for some constant . This is a simple and general method of converting biased structures into statically optimal ones, and it enables us to waive the requirement of distributional knowledge present in all previous biased point location structures, at least for triangulations. Our approach can be seen as a generalization and simplification of a method by Goodrich for dictionaries .
Let be some universal set, and let be a partition of into pieces. The elements of are called points, the subsets in are called regions. A location query takes some point and returns the region with . The result of a location query with input is denoted by . A data structure for location queries is called a location query structure.
Let be a sequence of queries, and let denote the results of these queries. Let be the number of occurrences of in the first elements of , and define , the number of times occurs in the entire sequence. Furthermore, let be the time of the th occurrence of in ; thus .
We use to refer to ; this avoids clutter generated by additive terms that would otherwise be needed to handle degenerate cases of our analysis. We next define the notion of a biased structure.
Suppose we choose proportional to the number of queries that return the given region, e.g., . In this case, a biased location query structure achieves an amortized query time that is (of the order of) the entropy of the query distribution. As we argued in the introduction, this is optimal for our decision-tree model. We now define the notion of static optimality.
Note that a statically optimal structure is given neither the frequency function nor any weights in advance, in particular, the structure does not need to be static.
We provide a simple method for making a biased location query structure statically optimal, assuming a few technical conditions. The main such condition is that we should be able to construct the biased query structure not just on the set , but on any subset of . We require that a location query structure for performs as quickly as a biased structure for when a region in is queried, and that it reports failure in time if the query lies outside of . Formally:
Note that is just the number of queries that result in failure. Given Definition ?, we may now state our main theorem:
We now describe the construction for Theorem ?. By assumption, we are given a set of regions, and we have available an time location query structure on with construction cost as well as a subset-biased structure with construction cost .
3.1Description of the structure
Let and be two constants such that (e.g., and ). The simple idea behind our transformation is as follows: after every queries, we build a subset-biased structure for the most commonly accessed regions, in time. We also keep a static time structure as a backup for failed queries in the subset-biased structure. Formally, the structure has several parts:
A static query time structure.
A structure that keeps track of how often each region was queried and that is capable of reporting the most popular regions in time. Since in each step we increment the count for a single region by , we can easily maintain such a structure in linear space and constant time per update. (The additional space overhead can be made sublinear at the expense of determinism thorough the use of a streaming algorithm for the so-called heavy hitters problem (e.g.,). This shows that our transformation is also useful in a context where additional space is at a premium, for example for implicit data structures or when the data resides in read-only memory ).
A subset-biased structure that is built after queries and rebuilt every th query thereafter. The structure contains the at most most popular regions at the time of the rebuilding that have been queried at least times. In the choice of these regions, we break ties arbitrarily. The weight of a region , denoted , is the number of queries to at the time of the rebuilding. More precisely, if the rebuilding is at time , we set . Computing the most popular regions and the weight function takes time with the structure from Part ?. By assumption, the construction cost of is .
A search is executed on the subset-biased structure first. If it fails (at amortized cost ), it is executed in the static time structure.
3.2Initial analysis of structure
We will now analyze the properties of our structure. Our first lemma describes a key property of the rebuilding process: for any sufficiently popular region , the amortized query time for is proportional to the amortized query time a biased structure would achieve if it were weighted with the frequencies observed so far.
Proof: Since , we have . Thus, we first query the subset-biased structure . Suppose that has been rebuild last at time . There are two cases.
Suppose first that is contained in . Definition ? ensures that the amortized time for the query in is , where denotes the total number of queries for the regions in at time . We have (there have been queries so far) and (there have been at most queries since rebuilding). The lemma follows.
Now suppose that is not in . In this case, the query takes amortized time in and time in the static structure. We know that at time , there were regions at least as popular as . Thus, . It follows that
and the claimed bound suffices to account for the query time.
Using Lemma ?, we can now bound the running time in terms of the query frequencies.
Proof: The main summation is over the regions in . For each region , the initial (or less) queries take time , since during these queries is never in the subset-biased structure. The running times for the remaining queries (if any) are bounded using Lemma ?. The first additional term comes from the construction cost of the subset-biased structure, incurred every operations. The final term is the linear one-time cost to build the static structure.
In order to simplify the bound in Lemma ?, we need two technical lemmas to deal with the various terms. The first lemma shows how to simplify the summation for the later queries.
Proof: Since , we have . Also, . Thus,
Here, we used Stirling’s formula to bound .
The second lemma deals with the time for the initial queries.
Proof: Set . If , the lemma holds since
If , then
for large enough, as desired (recall that we defined to be at least ).
We can now prove our main theorem.
Proof: By Definition ?, we need to prove that the execution time is
By Lemma ?, the running time is bounded by
We now apply Lemma ? and note that to obtain a running time bound of
Since we defined to be at least , this simplifies to
If , the sum over is at most . In this case, the bound simplifies to , and the theorem is proved. Otherwise, if , Lemma ? applies with (a legal choice by our assumption on and ), and the term collapses into to give the theorem.
Proof: It is easy to apply our general transformation to the problem of planar point location in a triangulation, as all of the required ingredients are well known. We assume that the triangulation is given in a standard representation, such as a doubly-connected edge list (e.g.,).
For the static structure with query time and construction time, Kirkpatrick’s algorithm  can be used. For the subset-biased structure, the provided subset of triangles may not be a connected triangulation and thus needs to be triangulated; this takes time using the classic line sweep approach . This creates new triangles, which are marked specially and given small weights. The resultant triangulation and weighting is given to a biased structure such as Iacono’s . The marking can be used to detect whether a query to the subset-biased structure was successful. With all ingredients in hand, the claim now follows from Theorem ?.
Our choice of structures reflects a desire for the strongest asymptotic bounds possible. Thus, we have avoided structures that are randomized or that have non-linear construction cost; such structures, however, have far superior constants than the ones we use. If we took a data structure for the static time queries with an construction cost instead of , this would simply change the linear additive term in Theorem ? to .
5Point location in polygonal subdivisions with non-constant sized cells
Our work applies to point location in triangulations. It can also be extended to polygonal subdivisions where each region has constant complexity. Indeed, suppose every region has edges. We can just triangulate each region and then apply our result. As mentioned in the introduction, this operation could increase the entropy of the query outcomes. However, the log sum inequality  implies that for any nonnegative and . Thus, if we subdivide a region with probability into triangles, the entropy increases by at most . It follows that the overall entropy grows by at most , which is acceptable if is constant.
Recently, several data structures have been developed for optimal point location where the distribution is known in advance for convex connected , connected , and arbitrary polygonal  subdivisions of the plane, as well as the more general odds-on trees . Unfortunately, these structures are not biased according to our definition, since entropy-based lower bounds are not meaningful for them: a convex -gon splits the plane into two regions, so the entropy of the query outcomes is constant. Nonetheless, some distributions require time for a point location query (in a reasonable model of computation that is more restrictive than the one described here).
The entropy-sensitive structures for non-triangulations all basically work by triangulating the given subdivision as a function of the provided probability distribution, and then using one of the biased structures on the resultant triangulation. The main conceptual problem in using our framework with such a structure is that it is unclear how to triangulate during the rebuilding process, since the optimal triangulation is not known in advance. One could imagine that triangulating during each rebuild based on the observed queries so far would work well, but proving this would require a more complex and specialized analysis than what has been presented in this paper.
The second author would like to thank Pat Morin for suggesting the problem to him, for stimulating discussions on the subject, and for hosting him during a wonderful stay at the Computational Geometry Lab at Carleton University. We would also like to thank the anonymous referees for insightful comments that helped improve the presentation of the paper.
- In this context, query time refers to the expected depth of the associated decision tree.
- The method presented makes the assumption that given a triangle in a triangulation of size on which a Kirpatrick hierarchy has been built, the complexity of the intersection of with any level of the hierarchy is constant; this is false as examples where the intersection is size are easy to produce.
- An algorithm for organization of information.
G. M. Adelson-Velskiĭ and E. M. Landis. Dokl. Akad. Nauk SSSR, 146:263–266, 1962.
- Self-improving algorithms.
N. Ailon, B. Chazelle, K. L. Clarkson, D. Liu, W. Mulzer, and C. Seshadhri. SIAM J. Comput., 40(2):350–375, 2011.
- Efficient expected-case algorithms for planar point location.
S. Arya, S.-W. Cheng, D. M. Mount, and R. Hariharan. In Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT), volume 1851 of Lecture Notes in Computer Science, pages 353–366. Springer-Verlag, 2000.
- Nearly optimal expected-case planar point location.
S. Arya, T. Malamatos, and D. M. Mount. In Proc. 41st Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 208–218, 2000.
- Entropy-preserving cuttings and space-efficient planar point location.
S. Arya, T. Malamatos, and D. M. Mount. In Proc. 12th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 256–261, 2001.
- A simple entropy-based algorithm for planar point location.
S. Arya, T. Malamatos, and D. M. Mount. In Proc. 12th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 262–268, 2001.
- A simple entropy-based algorithm for planar point location.
S. Arya, T. Malamatos, and D. M. Mount. ACM Trans. Algorithms, 3(2):Art. 17, 17 pp, 2007.
- Optimal expected-case planar point location.
S. Arya, T. Malamatos, D. M. Mount, and K. C. Wong. SIAM J. Comput., 37(2):584–610, 2007.
- Constant-work-space algorithms for geometric problems.
T. Asano, W. Mulzer, G. Rote, and Y. Wang. JoCG, 2(1):46–68, 2011.
- Biased search trees.
S. W. Bent, D. D. Sleator, and R. E. Tarjan. SIAM J. Comput., 14(3):545–568, 1985.
- Computational Geometry: Algorithms and Applications.
M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Springer-Verlag, Berlin, third edition, 2008.
- Space-optimal heavy hitters with strong error bounds.
R. Berinde, P. Indyk, G. Cormode, and M. J. Strauss. ACM Trans. Database Syst., 35(4):Art. 26, 28 pp, 2010.
- Odds-on trees.
P. Bose, L. Devroye, K. Douïeb, V. Dujmovic, J. King, and P. Morin. arXiv:1002.1092
- Point location in disconnected planar subdivisions.
P. Bose, L. Devroye, K. Douïeb, V. Dujmovic, J. King, and P. Morin. arXiv:1001.2763
- Layered working-set trees.
P. Bose, K. Douïeb, V. Dujmovic, and J. Howat. In Proc. 9th Latin American Theoretical Informatics Symposium (LATIN), volume 6034 of Lecture Notes in Computer Science, pages 686–696. Springer-Verlag, 2010.
- Dynamic optimality for skip lists and B-trees.
P. Bose, K. Douïeb, and S. Langerman. In Proc. 19th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 1106–1114, 2008.
- Distribution-sensitive point location in convex subdivisions.
S. Collette, V. Dujmovic, J. Iacono, S. Langerman, and P. Morin. In Proc. 19th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 912–921, 2008.
- Entropy, triangulation, and point location in planar subdivisions.
S. Collette, V. Dujmovic, J. Iacono, S. Langerman, and P. Morin. arXiv:0901.1908
- Elements of Information Theory.
T. M. Cover and J. A. Thomas. Wiley-Interscience, second edition, 2006.
- Optimal point location in a monotone subdivision.
H. Edelsbrunner, L. J. Guibas, and J. Stolfi. SIAM J. Comput., 15(2):317–340, 1986.
- A priority queue with the working-set property.
A. Elmasry. Internat. J. Found. Comput. Sci., 17(6):1455–1465, 2006.
- Implicit data structures for weighted elements.
G. N. Frederickson. Inform. and Control, 66(1-2):61–82, 1985.
- Two applications of a probabilistic search technique: Sorting and building balanced search trees.
M. L. Fredman. In Proc. 7th Annu. ACM Sympos. Theory Comput. (STOC), pages 240–244, 1975.
- A new algorithm for minimum cost binary trees.
A. M. Garsia and M. L. Wachs. SIAM J. Comput., 6(4):622–642, 1977.
- Competitive tree-structured dictionaries.
M. T. Goodrich. In Proc. 11th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 494–495, 2000.
- Methods for achieving fast query times in point location data structures.
M. T. Goodrich, M. Orletsky, and K. Ramaiyer. In Proc. 8th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 757–766, 1997.
- A dichromatic framework for balanced trees.
L. J. Guibas and R. Sedgewick. In Proc. 19th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 8–21, 1978.
- Optimal computer search trees and variable-length alphabetical codes.
T. C. Hu and A. C. Tucker. SIAM J. Appl. Math., 21:514–532, 1971.
- Improved upper bounds for pairing heaps.
J. Iacono. In Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT), volume 1851 of Lecture Notes in Computer Science, pages 32–45. Springer-Verlag, 2000.
- Alternatives to splay trees with worst-case access times.
J. Iacono. In Proc. 12th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 516–522, 2001.
- Expected asymptotically optimal planar point location.
J. Iacono. Comput. Geom. Theory Appl., 29(1):19–22, 2004.
- Key-independent optimality.
J. Iacono. Algorithmica, 42(1):3–10, 2005.
- A new proof of the Garsia-Wachs algorithm.
J. H. Kingston. J. Algorithms, 9(1):129–136, 1988.
- Optimal search in planar subdivisions.
D. Kirkpatrick. SIAM J. Comput., 12(1):28–35, 1983.
- Optimum binary search trees.
D. E. Knuth. Acta Inform., 1:14–25, 1971.
- Greedy binary search trees are nearly optimal.
J. F. Korsh. Inform. Process. Lett., 13(1):16–19, 1981.
- Weighted multidimensional B-trees used as nearly optimal dynamic dictionaries.
H.-P. Kriegel and V. K. Vaishnavi. In Proc. 10th Symposium on Mathematical Foundations of Computer Science (MFCS), volume 118 of Lecture Notes in Computer Science, pages 410–417. Springer-Verlag, 1981.
- Location of a point in a planar subdivision and its applications.
D. T. Lee and F. P. Preparata. SIAM J. Comput., 6(3):594–606, 1977.
- A separator theorem for planar graphs.
R. J. Lipton and R. E. Tarjan. SIAM J. Appl. Math., 36(2):177–189, 1979.
- Applications of a planar separator theorem.
R. J. Lipton and R. E. Tarjan. SIAM J. Comput., 9(3):615–627, 1980.
- Nearly optimal binary search trees.
K. Mehlhorn. Acta Inform., 5(4):287–295, 1975.
- A best possible bound for the weighted path length of binary search trees.
K. Mehlhorn. SIAM J. Comput., 6(2):235–239, 1977.
- Dynamic binary search.
K. Mehlhorn. SIAM J. Comput., 8(2):175–198, 1979.
- Arbitrary weight changes in dynamic trees.
K. Mehlhorn. RAIRO Inform. Théor., 15(3):183–211, 1981.
- A fast planar partition algorithm. I.
K. Mulmuley. J. Symbolic Comput., 10(3-4):253–280, 1990.
- Planar point location using persistent search trees.
N. Sarnak and R. E. Tarjan. Commun. ACM, 29(7):669–679, 1986.
- A simple and fast incremental randomized algorithm for computing trapezoidal decompositions and for triangulating polygons.
R. Seidel. Comput. Geom. Theory Appl., 1(1):51–64, 1991.
- The Mathematical Theory of Communication.
C. E. Shannon and W. Weaver. The University of Illinois Press, Urbana, Ill., 1949.
- Self-adjusting binary search trees.
D. D. Sleator and R. E. Tarjan. J. ACM, 32(3):652–686, 1985.
- Three major extensions to Kirkpatrick’s point location algorithm.
A. Z. B. H. Talib, M. Chen, and P. Townsend. In Proc. Conference on Computer Graphics International (CGI), pages 112–121, 1996.
- Dynamic weighted binary search trees.
K. Unterauer. Acta Inform., 11(4):341–362, 1978/79.
- Efficient dynamic programming using quadrangle inequalities.
F. F. Yao. In Proc. 12th Annu. ACM Sympos. Theory Comput. (STOC), pages 429–435, 1980.