External Memory Planar Point Location with Fast Updates This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant no MISU F 6001 1 and by NSF Grant CCF-1533564.

External Memory Planar Point Location with Fast Updates thanks: This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant no MISU F 6001 1 and by NSF Grant CCF-1533564.

John Iacono   Ben Karsin1   Grigorios Koumoutsos1
Université Libre de Bruxelles, Belgium
Université Libre de Bruxelles. {johniacono,bkarsin,gregkoumoutsos}@gmail.comNew York University, USA.
2footnotemark: 2
Abstract

We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with query time and amortized update time, where is the number of segments, the block size and is a small positive constant. This is a factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of and . Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane.

\sidecaptionvpos

figuret

1 Introduction

The dynamic planar point location problem is one of the most fundamental and extensively studied problems in geometric data structures, and is defined as follows: We are given a connected planar polygonal subdivision with edges. For any given query point , the goal is to find the face of that contains , subject to insertions and deletions of edges. An equivalent formulation, which we use here is as follows: given a set of interior-disjoint line segments in the plane, for any given query point , report the first line segment in that a vertical upwards-facing ray from intersects, subject to insertions and deletions of segments.

Dynamic planar point location has many applications in spatial databases, geographic information systems (GIS), computer graphics, etc. Moreover it is a natural generalization of the dynamic dictionary problem with predecessor queries; this problem can be seen as the one dimensional variant of planar point location.

In this paper we focus on the External Memory model, also known as the Disk Access Model (DAM) [2]. The DAM is the standard method of designing algorithms that efficiently execute on large datasets stored in secondary storage. This model assumes a two-level memory hierarchy, called disk and internal memory and it is parameterized by values and ; the disk is partitioned into blocks of size , of which can be stored in memory at any given moment. The cost of an algorithm in the DAM is the number of block transfers between memory and disk, called Input-Output operations (I/Os). The quintessential DAM-model data structure is the B-Tree [11]. See [24, 25] for surveys. Many applications of dynamic planar point location, such as GIS problems, must efficiently process datasets that are too massive to fit in internal memory, thus it is of great relevance and interest to consider the problem in the DAM and to devise I/O efficient algorithms.

1.1 Previous Work

RAM Model.

In the RAM model (the leading model for applications where all data fit in the internal memory) the dynamic planar point location problem has been extensively studied [4, 10, 18, 17, 14, 20]. It is a major and long-standing open problem in computational geometry to design a data structure that supports queries and updates in time [15, 16, 23], i.e., to achieve the same bounds as for the dynamic dictionary problem. In a recent breakthrough, Chan and Nekrich in FOCS’15 [14] presented a data structure supporting queries in time and updates in time. They also showed the tradeoff of supporting queries in time and updates in time or vice-versa for .

Recently Oh and Ahn [22] presented the first data structure for a more general setting where the polygonal subdivision is not necessarily connected; their data structure supports queries in time and updates in amortized time.

External Memory model (See Table 1).

Several data structures have been presented over the years which support queries and updates in polylog() I/Os[1, 7, 5]. Table 1 contains a list of results of prior work. The best update bound known is by Arge, Brodal and Rao [5] and achieves amortized I/Os. The query time of their data structure is . Very recently, the first data structure that supports queries in I/Os was announced by Munro and Nekrich [21]. In particular they support queries in I/Os. However their update time is slightly worse than logarithmic, .

Reference Space Query Time Insertion Time Deletion Time
Agarwal et al. [1]
Arge and Vahrenhold [7]
Arge et al. [5]
Munro and Nekrich [21]
This paper
Table 1: Overview of results on dynamic planar point location in external memory. Query bounds are worst-case and update bounds are amortized. Space usage is measured in words.

Fast Updates in External Memory.

One of the most intriguing and practically relevant features of the external memory model is that it allows fast updates. For the dynamic dictionary problem with predecessor queries, the optimal update bound in the RAM model is . In external memory, however, -trees achieve the optimal query time of and typical update time of , although substantially faster update times are possible. Brodal and Fagerberg [13] showed that amortized I/Os per update can be supported, for small positive constant, , while retaining -time queries; they further showed that this is an asymptotically optimal tradeoff between updates and queries. Observe that this update bound is a huge speedup from and that for reasonable choices of parameters, e.g. , , , this yields a subconstant amortized number of I/Os per update.

Given this progress and the fact that in the RAM model the bounds achieved for planar point location and the dictionary problem are believed to coincide, it is natural to conjecture that a similar update bound can be achieved for the dynamic planar point location problem. However, to date no result has been presented that achieves sublogarithmic insertion or deletion time.

1.2 Our Results

We consider the dynamic planar point location problem in the external memory model and present the first data structure with sublogarithmic amortized update time of I/Os. Prior to our work, the best update bound for both insertions and deletions was , achieved by Arge et al. [5]. Our main result is:

Theorem 1.1 (Main result).

For a constant , there exists a data structure which uses space, answers planar point location queries in I/Os and supports insertions and deletions in amortized I/Os. The data structure can be constructed in I/Os.

To obtain this result, several techniques are used. Our primary data structure is an augmented interval tree [19]. We combine both the primary interval tree and two auxiliary structures described below with the buffering technique [13, 3] to improve insertion and deletion bounds. In Section 2 we prove Theorem 1.1 using our auxiliary structures as black boxes and omit some technical details relating to rebuilding; these details are deferred to Section 5.

Our first auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose left (right) endpoints lie on the same vertical line. This is called the left (right) structure. Left/Right structures of Agarwal et al. [1], which support queries and updates in I/Os, are used by several prior works [1, 7, 5]. Our structure improves on their result by reducing the update bound by a factor of . We obtain the following result, the proof of which is the topic of Section 3:

Theorem 1.2 (Left/right structure).

For a set of non-intersecting segments whose right (left) endpoints lie in the same vertical line and a constant , we can create a data structure which supports vertical ray-shooting queries in I/Os and insertions and deletions in amortized I/Os. This data structure uses space and it can be constructed in I/Os. If the segments are already sorted, it can be constructed in I/Os.

Our second auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose endpoints lie in a set of vertical lines. These vertical lines define vertical slabs, hence the structure is called a multislab structure. We obtain the following result, the proof of which is the topic of Section 4:

Theorem 1.3 (Multislab structure).

For a constant and set of non-intersecting segments whose endpoints lie in vertical lines, we can create a data structure which supports vertical ray-shooting queries in I/Os and insertions and deletions in amortized I/Os. This data structure uses space and it can be constructed in I/Os. If the segments are already sorted according to a total order, it can be constructed in I/Os.

A major challenge faced by previous multislab structures is how to efficiently support insertions. At a high-level, it is hard to deal with insertions in cases where a total order is maintained: each time a new segment gets inserted we need to determine its position in the total order, which cannot be done fast. Arge and Vitter [7] developed a deletion-only multislab data structure and then used the so-called logarithmic method [12] which allowed them to handle insertions in I/Os. Later Arge, Brodal and Rao [5] developed a more complicated multislab structure supporting insertions in amortized I/Os by performing separate case analysis depending on the values of .

Here, we support insertions in a much simpler way by breaking each inserted segment into smaller unit segments whose endpoints lie on two consecutive vertical lines and can be compared easily to the segments already stored. This way, we are able to support insertions easily in I/Os. Finally, we add buffering and obtain sublogarithmic update bounds.

1.3 Notation and Preliminaries

External Memory Model.

Throughout this paper we focus on the external memory model of computation. denotes the number of segments in the planar subdivision, the block size and the number of elements that fit in internal memory. We assume that and (the tall cache assumption). It is well-known that sorting elements requires I/Os [2]. Given that , this bound is . We use this bound for sorting in many places without further explanation.

Ray-shooting Queries.

In the rest of this paper, we focus on answering vertical ray-shooting queries in a dynamic set of non-intersecting line segments. Let be the set of segments of the polygonal subdivision . Given a query point , the answer to a vertical ray-shooting query is the the first segment of hit by a vertical ray emanating from a query point in the direction. It is well-known that if the polygonal subdivision is connected, a planar point location query for a point can be answered in I/Os after answering a vertical ray-shooting query for [7].

-Trees.

All tree structures that we will use are variants of the -Trees [13] which are -trees except that the internal nodes have at most (and not ) children; the leaves still store data items. For constant , this does not change the asymptotic height of the tree or the search cost, both remain .

2 Overall Structure

In this Section we prove Theorem 1.1, using the data structures of Theorems 1.2 and 1.3 (detailed in Sections 3 and 4, respectively). Given non-intersecting segments in the plane and a constant , we construct a -space data structure which answers vertical ray-shooting queries in I/Os and supports updates in amortized I/Os. Throughout this section we let .

The Data Structure.

As in the previous works on planar point location, our primary data structure is based on the interval tree (the external interval tree defined in [9]). Our interval tree is a -tree which stores the -coordinates of segment endpoints in its leaves. Here we assume for clarity of presentation that the interval tree is static, i.e. all new segments inserted share -coordinates with already stored segments; in Section 5 we remove this assumption and extend our data structure to accommodate new -coordinates and achieve the bounds of Theorem 1.1.

Each node of is associated with several secondary structures, as we explain later, and each segment is stored in the secondary structures of exactly one node of . Each node of is associated with a vertical slab . The slab of the root is the whole plane. For an internal node , the slab is divided into vertical slabs corresponding to the children of , separated by vertical lines called slab boundaries, such that each slab contains the same number of vertices of .

Let be the set of segments that compose . Each segment is stored in the secondary structures associated with the highest node of such that is completely contained in slab and intersects at least one slab boundary partitioning . We say that is associated with node . Segments associated with leaves are stored explicitly in the corresponding leaf. By construction of the slab boundaries, each leaf stores segments in blocks.

Consider a segment associated with node of . Let and be the children slabs of where the left and right endpoints of lie. We call the segment the left subsegment of , the segment the right subsegment of and the rest of (which spans children slabs ) is its middle subsegment. See Figure 1 for an illustration. In this example, the left subsegment is , the right subsegment is , and the portion of in and is the middle subsegment.

Figure 1: The slab of node of the interval tree is divided into slabs corresponding to its children . Segment is associated with node , with left subsegment in slab , right subsegment in and the middle subsegment crosses slabs .

Let be the set of segments associated with a node of . To store segments of , node of contains the following secondary structures:

  1. A multislab structure which stores the set of middle segments.

  2. left structures , for , storing the left (sub)segments of slab .

  3. right structures , for , storing the right (sub)segments of slab .

In addition, each internal node contains an insertion buffer and deletion buffer , each storing up to segments.

Construction and Space Usage.

Buffers and fit in blocks. By Theorems 1.2 and 1.3, a secondary structure storing segments uses space. Since each segment of is stored in at most 3 secondary structures, overall secondary structures of use space. Thus each node uses space. We get that our data structure uses overall space. The interval tree can be constructed in I/Os. By Theorems 1.2 and 1.3, all secondary structures of a node of can be constructed in I/Os . Thus, all secondary structures of the tree can be constructed in I/Os.

Queries.

To answer a vertical ray-shooting query for a point , we traverse the root-to-leaf path of based on the -coordinate of , while maintaining a segment (initialized to null) which is the answer to the query among segments associated with nodes we have traversed so far. At each node visited along this path, we perform a vertical ray-shooting on each of the secondary structures of and update if a closer segment above is found as a result. Then, we remove all segments that appear in both and . Next, we ray-shoot among segments stored in and update if necessary. Finally, we determine which child of to visit, and flush any segments of that are contained in the slab of to . We then continue the process at . Once a leaf node is reached, we simply compare the segments it contains with and return the closest segment above among them and .

Bounding the query cost: Since any root-to-leaf path of has length and each secondary data structure supports ray-shooting queries in I/Os (due to Theorems 1.2 and 1.3), we get that a query is answered in I/Os. Note that in each node of the root-to-leaf path visited, the operations involving and require I/Os, thus they increase the total cost by at most a factor.

Insertions.

To handle insertions, we use the insertion buffers stored in nodes of . When a new segment is inserted, we insert it in the insertion buffer of the root. Let be an internal node with children . Whenever becomes full, it is flushed. Segments of that cross at least one slab boundary partitioning are inserted in the secondary structures of ; segments that are contained in the slab of are inserted in , for . In case becomes full for some node whose children are leaves, we insert those segments explicitly at the corresponding leaves. When a leaf becomes full, we restructure the tree using split operations on full nodes.

Bounding the insertion cost: We compute the amortized cost of an insertion by considering three components:

  1. The cost for moving segments between insertion buffers. Whenever an insertion buffer gets full, it forwards segments to the buffers of its children performing I/Os. Since a flushing occurs every insertions in , the amortized cost of such operations is . Each segment will move in at most insertion buffers before it is inserted in the secondary structures of a node (or in a leaf). Thus the amortized cost for moving between buffers is .

  2. The insertion cost in the secondary structures. By Theorems 1.2 and 1.3 we get that insertions in secondary structures require I/Os.

  3. The cost of restructuring the tree after insertions when a leaf becomes full. We show in Section 5 that the restructuring requires amortized I/Os, by slightly modifying our primary interval tree data structure.

We conclude that our data structure supports insertions in amortized I/Os.

Deletions.

To support deletions, we use the deletion buffers stored in all nodes of . To delete a segment , we first check whether is in the insertion buffer of the root and in that case we delete it; otherwise we store it in . Similar to insertions, whenever gets full for some internal node with children , we flush . The segments of crossing at least one slab boundary partitioning are deleted from the corresponding secondary structures associated with ; the other segments of are moved to buffers ; in case a segment inserted in , we delete it from both buffers. In case becomes full for some parent of leaves, we delete those segments explicitly from the corresponding leaves.

Bounding the deletion cost: The deletion cost has three components:

  1. Moving segments between the deletion buffers. Using the same argument as for insertions, we get that this requires I/Os, amortized.

  2. The cost of deletion in the secondary structures. By Theorems 1.2 and 1.3 we get that deletions in secondary structures require amortized I/Os.

  3. The cost of restructuring the tree. Every deletions, we rebuild the structure using I/Os, to get and amortized restructuring cost of I/Os.

Overall deletions are supported in amortized I/Os.

3 Left and Right Structures

In this section we prove Theorem 1.2. Given points all of whose right (left) endpoints lie on a single vertical line, we construct a data structure which answers vertical ray-shooting queries on those segments in I/Os and supports insertions and deletions in amortized I/Os  for a constant .

We describe the structure for the case where we are given a set of segments whose right endpoints have the same -coordinate. The case where the left endpoints of the segments have the same -coordinate is completely symmetric. For a segment , we will refer to the -coordinate of its right endpoint as the -coordinate of . Conversely we define the -coordinate of to be the -coordinate of its left endpoint.

The Data Structure.

We store all segments of in an augmented B-tree which supports vertical ray-shooting queries, insertions and deletions. The degree of each node is between and , except the root which might have degree in the range , and leaves store elements. For a node , let be the subtree rooted at . Segments are sorted according to their -coordinates. Thus each subtree corresponds to a range of -coordinates, which we call the -range of node . Let be an internal node of with children . Node stores the following information:

  1. A buffer of segments of capacity which contains segments in the -range of whose left endpoints have the smallest -coordinates (i.e., segments that extend the farthest from the vertical line) and are not stored in any buffer for an ancestor of . In other words, together with segments of buffers form an external memory priority search tree [6].

  2. An insertion buffer and a deletion buffer , each storing up to segments.

  3. A list that contains, for each child , the segment with minimum -coordinate stored in . We call this the minimal segment for child .

The data structure satisfies the following invariants: For each node , either or if , then and are empty and all buffers stored in descendants are empty. Also, for each node , buffers and are disjoint. Finally, for a leaf , and are empty.

Construction and Space Usage.

Overall buffers and lists of each node contain segments, i.e. they can be stored in blocks. Thus can be stored in blocks, i.e. it requires space. Construction of requires I/Os, since we need to sort all segments according to their -coordinates. If the segments are already sorted according to their -coordinate, then can be created in I/Os.

Queries in the static structure.

To get a feel for how our structure supports queries, we first show how to perform queries in the static case, i.e., assuming there are no insertions and deletions and all buffers and are empty. Later we will give a precise description of performing queries in the fully dynamic structure.

Let be the ray emanating from in the direction and the ray emanating from in the direction. We query the structure by finding the first segment hit by both and . We keep two pointers, and , initialized at the root. We also keep the closest segments and seen so far in the and direction respectively (initialized to and ). At each step, we update both and to move from a node of depth to a node of depth . While at level , and might coincide, or one of them might be undefined (set to null).

We now describe the query algorithm. We start at the root of and advance down, while updating , , and ,. When at depth , we find the first segment hit by among and and update if necessary (i.e. if is the first segment hit by among all segments seen so far). Similarly, we ray-shoot on among and and update if necessary. To determine in which nodes of depth to continue the search, we ray-shoot on and among and (i.e., all minimal segments of children of and ). Let be the node containing the first segment hit by (if exists). If the -range of is higher than the -coordinate of or if does not exist, we leave undefined for level . Otherwise, we set . Similarly, let be the node containing the first minimal segment hit by (if such a segment exists). If the -range of is lower than the -coordinate of or if does not exist, we leave undefined for level . Otherwise we set .

If both and are undefined for the next level , we stop the procedure and output as the result to the vertical ray-shooting query. Otherwise we repeat the same procedure in the next level. When we reach a leaf level, we find the first segment hit by among and , update if necessary, and output as the result of the query.

Figure 2: Example of the query algorithm in the left structure: Left column shows the segments stored in , the query point and the vertical ray emanating from . Right column shows buffers of the nodes of . Red segments are stored in the root. For nodes , the green segment is their minimal segment, i.e., the one stored in list . By ray-shooting on among green segments, the first segment hit upwards is , which is stored in , thus we set . Note that (the correct answer for the query) is not stored in , i.e., maintaining only produces an incorrect answer. Thus, our algorithm ray-shoots downwards as well, hitting , which is stored in , and setting . Then, by ray-shooting on among and , the first segment we hit upwards of is .

Remark: The reader might wonder why we answer vertical ray-shooting queries in both directions and keep two pointers and . Isn’t it sufficient to answer queries in one direction and keep one pointer at each step? Figure 2 shows an example where this is not true and maintaining only the pointer would result in an incorrect answer.

The formal proof of correctness of this query algorithm is deferred to Appendix A.

Bounding the query cost: To count the cost, observe that in each step we move down the tree by one level and perform operations that require I/Os, as we check segments stored in the current nodes and . Since the height of the tree is , a query is answered in I/Os.

Insertions.

Assume we want to insert a segment into the left structure . If the -value of is smaller than the maximum -value of a segment stored in the buffer of the root , we insert into . Otherwise we store in the insertion buffer of the root . Note that insertion of in might cause to overflow (i.e., ); in that case we move the segment of with the maximum -value into the insertion buffer of the root .

Let be an internal node with children . Whenever the insertion buffer becomes full, we flush it, moving the segments to buffers of the corresponding children. For a segment that should be stored in child , we repeat the same procedure as in the root: Check whether has smaller -value than the maximum -value of a segment stored in and if yes, store in , otherwise store it in . If overflows, we move its last segment (i.e. the one with maximum -value) into . Also, if gets stored in and its -value is smaller than all previous segments of , we update the minimal segment of , .

When overflows for some leaf , we split into two leaves and , as in standard -trees. Note that this might cause recursive splits of nodes at greater height.

Bounding the insertion cost: To flush a buffer and forward segments to buffers and , for we perform I/Os. Since becomes full after at least insertions, the amortized cost of moving a segment from to buffers of a child of is . Each inserted segment moves between buffers in a root-to-leaf path of length , thus the total amortized cost for moves between buffers is I/Os. The restructuring of due to splitting nodes requires amortized I/Os, as in standard B-trees. Thus, insertions are supported in amortized I/Os.

Deletions.

To delete a segment , we first check whether it is stored in the buffers of the root or ; in this case we delete it. Otherwise, we insert in the deletion buffer of the root .

Let be an internal node with children . Whenever becomes full we flush it and move the segments to the corresponding children and repeat the same procedure: For a segment which moves to child , we check whether it is stored in or : if yes, we delete it and update the minimal segment of in if necessary. Otherwise, we store in the deletion buffer . If segment buffer underflows (i.e., ), we refill it using segments stored in buffers ; the segments moved to are deleted from and all necessary updates in are performed. This might cause underflowing segment buffers for children of ; we handle those in the same way. In case all buffers become empty and , we move the segments from to until either or .

Bounding the deletion cost: Deletion cost consists of three components:

  1. Cost for moving segments between buffers: Using the same analysis as for insertions we get that this requires amortized I/Os.

  2. Cost due to refilling of buffers : For a node with children , while refilling buffer from we perform I/Os and we move segments one level higher. Thus the amortized cost of moving a segment up by one level is . Since the tree has height , over a sequence of deletions the total number of moves of segments by one level is . Thus the total cost due to refilling is at most , which implies that the amortized cost is .

    A corner case that we did not take into account above is when the total number of segments stored in buffers are less than . In this case it is not valid that the amortized cost of updating is . To take care of this, we use a simple amortization trick: we double charge all I/Os performed relating to insertions. This way, for each buffer there is a saved I/O from the time when segments move from to node . We use this additional saved I/O when gets emptied due to the refilling of .

  3. Restructuring requires amortized I/Os, by rebuilding the structure after deletions.

Overall, the amortized deletion cost is I/Os.

Queries in the dynamic structure.

We now describe how to extend our query algorithm to the dynamic case. In order to ensure that all nodes visited are up-to-date and we do not miss any updates in the insertion/deletion buffers, when moving a pointer from a node to its child , we flush any deletes in to , i.e. delete segments of that are stored in , store the other segments in and update if necessary. We then delete any segments found in both and . Finally, we compare segments in with (recall this is the first segment hit by among segments considered so far) and, if any segment in would be hit by before we replace with it. Clearly this increases the total cost by at most a factor compared to the static case, thus the query cost is I/Os.

4 Multislab Structure

In this section we prove Theorem 1.3. Assume that we are given a set of non-intersecting segments with endpoints on at most vertical lines , for some constant . We show that those segments can be stored in a data structure which uses space, supports vertical ray-shooting queries in I/Os, and updates in amortized I/Os, for . This data structure can be constructed in I/Os. We call this data structure a multislab structure.

For notational convenience we set . This way endpoints of the segments lie on at most vertical lines . For , let denote the vertical slab defined by vertical lines and . We will show that queries are supported in I/Os and updates in I/Os. Theorem 1.3 then follows.

Total Order.

In order to implement the multislab structure we need to maintain an ordering of the segments based on their -coordinates. Using standard approaches (see e.g. [7, 5]) we can define a partial order for segments that can be intersected by a vertical line. Arge et. al. [8] showed how to extend a partial order into a total order on segments (not necessarily all intersecting the same vertical line) in I/Os. We use this total order to create our multislab structure.

The Data Structure.

We store the ordered segments in an augmented B-tree which supports queries, insertions and deletions. The degree of each node is between and , except the root which might have degree in the range . Leaves store elements. For a node , let be the subtree rooted at . Let be the children of an internal node . Node stores the following information:

  1. A buffer of capacity which contains the highest (according to the total order) segments stored in which are not stored in any buffer for an ancestor of . In other words, together with segments of buffers form an external memory priority search tree [6].

  2. An insertion buffer and a deletion buffer , both storing up to segments.

  3. A list which contains, for each slab , , and each child , , the highest segment (according to the total order) crossing slab stored in .

The data structure satisfies the following invariants: i) for each node , either or if , then and are empty and all buffers of descendants of are empty, ii) for each node , buffers and are disjoint, and iii) for every leaf , and are empty.

Construction and Space Usage.

Overall buffers of each node contain segments and list contains at most segments, i.e., they can be stored in blocks. Thus can be stored in blocks, i.e. it requires space. The structure can be constructed in I/Os. If segments are already sorted according to a total order, construction requires I/Os.

Insertions.

To insert a new segment we need to determine its position in the total order. Clearly, we can not afford to produce a new total order from scratch, as this costs I/Os. Thus, we break into at most unit segments, where each segment crosses exactly one slab. In particular, if crosses slabs , we break it into unit segments , where segment crosses slab . We call all such unit segments stored in new segments. The rest of the segments stored in are called the old segments of . Now we can easily update the total order: segment needs to be compared only with segments crossing slab ; if and are the predecessor and successor of within slab , we locate in an arbitrary position between and in the total order. This way a valid total order is always maintained.

We now describe the insertion algorithm. When segment needs to be inserted, we first break it into unit segments . For each segment , , we first check whether it should be inserted in the buffer of the root: if this is the case we store it there; otherwise we store it in the insertion buffer of the root . In case overflows (i.e. ) we move its last segment (according to the total order) to . Let be an internal node with children . Each time becomes full, we flush it and move the segments to its children , for . For a segment moving from to , we first check whether it is greater (according to the total order) than the minimum segment stored in and if so we store it in ; otherwise we store it in buffer . In case overflows (i.e. ) we move its last segment to . Also we update information in list if necessary. In case becomes full, we repeat the same procedure recursively.

When overflows for some leaf , we split into two leaves and , as in standard -trees. Note that this might cause recursive splits of nodes at greater height.

Bounding the insertion cost: To flush a buffer and move segments to buffers of child nodes and , we need to perform I/Os. Since each segment breaks into at most unit segments, a buffer of size becomes full after at least insertions. Thus the amortized cost of moving a segment from a buffer of depth to depth is . Since each segment will be eventually stored in a node of depth , the amortized cost until it gets inserted is . The restructuring of due to splitting full nodes requires amortized I/Os, as in standard B-trees. Overall insertions require amortized I/Os.

Linear space usage: To avoid increases in space usage due to unit segments, whenever there are new segments, we rebuild the whole structure. This way the space usage is . This rebuilding requires I/Os, i.e., amortized I/Os, thus it does not asymptotically increase the insertion time.

Deletions.

The process of deleting a segment, , is similar to insertion: we break into at most unit segments where and are the leftmost and rightmost slabs spanned by and apply the deletion procedure for each of those unit segments separately.

The deletion algorithm for a unit segment is analogous to the one of the left (right) structure of Section 3. For completeness we describe it here. To delete a unit segment , we first check whether it is stored in the buffers of the root or ; in this case we delete it. Otherwise, we insert in the deletion buffer of the root . Let be an internal node with children . Whenever becomes full we flush it and forward the segments to the corresponding children and repeat the same procedure: For a segment which moves to child , we check whether it is stored in or and if this is the case, we delete it and update list if necessary. Otherwise, we store in the deletion buffer .

In case segment buffer underflows (i.e., ), we refill it using segments from buffers ; segments moved to are deleted from and gets updated (if needed). This might cause underflowing segment buffers ; we handle those in the same way. In case all buffers become empty and , we move to the segments from until either or . After deletions we rebuild our data structure.

Remark: Note that here we split all segments into unit segments . However, the old segments are not unit segments and are stored manually in the data structure. However this does not affect our algorithm: whenever the first unit segment which is a part of reaches the node such that , we delete from and remove from deletion buffers. The remaining segments will eventually reach node and realize that is already deleted from ; at this point gets deleted.

Bounding the deletion cost: The analysis of the deletion cost is identical to the analysis of deletions in the structure of Section 3. Since each segment breaks into at most unit segments, we get an amortized deletion cost of .

Linear space usage: Similar to insertions, we need to make sure that the total space used is not increasing asymptotically due to the use of at most unit segments in deletion buffers for each deleted segment . The total capacity of deletion buffers is . Since we rebuild the structure after deletions, there are at most segments stored in deletion buffers, i.e., deletion buffers never get totally full and total space used is

Queries.

Let be the query point and be the the vertical ray emanating from in the direction. Let also be the slab containing . We can find in I/Os by storing all slab boundaries in a block. We perform a root-to-leaf search and we keep the first segment hit by among segments seen so far. While visiting a node we do the following: (i) perform a vertical ray-shooting query from among segments stored in buffers and , and update if necessary (ii) move to the child which contains the successor segment of in list (see Figure 3) and (iii) find in (resp. ) the segments crossing slab and should be stored (according to the total order) in and move them to or (resp. delete them from or store it in ). If a segment inserted in is also stored in , we delete it from both buffers.

Once we reach a leaf , we first delete from the segments that are in the deletion buffer of its parent and then we perform ray-shooting query among the segments stored in and update if necessary.

Bounding the query cost: Since we follow a root-to-leaf path, and at each level we need to perform I/Os, a ray-shooting query for point is answered in I/Os.

Figure 3: Vertical ray-shooting queries in the multislab structure: Query point is in slab . is the vertical ray emanating from . While being at node of , to decide in which child to continue our search we examine all minimal segments stored in list . Among them, the first one hit by is . Thus the search continues at child of .

5 Counting the Restructuring Cost

In Section 2 we proved the Theorem 1.1 (query and update bounds of the overall structure) without taking into account the cost of restructuring the interval tree due to insertions that cause leaves to become full. In this section we show that Theorem 1.1 holds while taking into account the restructuring of as well.

When a leaf becomes full we need to split it. This split in turn might cause the split of the parent and possibly continue up the tree, thus causing some part of the tree to need rebalancing. While rebalancing, we need to perform updates in the secondary structures so that they are adjusted with the updated nodes of the interval tree . In this section, we show that we can slightly modify our data structure such that all updates in secondary structures can be performed in amortized I/Os. This implies that Theorem 1.1 holds.

Our Approach.

We use a variant of the weight-balanced -tree of [9]. Each leaf stores at most segment endpoints. Let be a node at height with parent . Node stores elements in its subtree . We will show that if node splits, then we can perform all updates needed in the secondary structures in I/Os. This implies that a split requires amortized I/Os, since after a restructuring, there should be at least insertions in until the next split is needed. Since each insertion can cause splits, we get an amortized restructuring cost of I/Os for insertion.

Splitting a node.

Node splits into two new nodes and . The slab of is divided into two slabs with slab boundary ; see Figure 4. To capture this change and update our data structure, we need to perform updates in the secondary structures of and construct the secondary structures for . We describe these updates in detail and show that they can be performed in I/Os. In our analysis we use the fact that all secondary structures (multislab and left/right) storing segments can be scanned in I/Os.

Figure 4: Splitting a node into and : slab is divided into slabs and with boundary .

Updates in secondary structures of .

We begin with the construction of left/right structures for and using the previous left/right structures for . We describe the creation of left structures and for and , respectively, and the right structures are symmetric. Segments that were stored in and do not cross (like segment in Figure 5) are stored in ; segments of that cross (see segment