The Space Complexity of 2-Dimensional Approximate Range Counting and Combinatorial DiscrepancyA preliminary version of the paper appeared in SODA’13.

# The Space Complexity of 2-Dimensional Approximate Range Counting and Combinatorial Discrepancy††thanks: A preliminary version of the paper appeared in SODA’13.

Zhewei Wei Ke Yi School of Information, Renmin University of China. zhewei@ruc.edu.cnHong Kong University of Science and Technology. yike@cs.ust.hk
###### Abstract

We study the problem of -dimensional orthogonal range counting with additive error. Given a set of points drawn from an grid and an error parameter , the goal is to build a data structure, such that for any orthogonal range , it can return the number of points in with additive error . A well-known solution for this problem is the -approximation, which is a subset that can estimate the number of points in with the number of points in . It is known that an -approximation of size exists for any with respect to orthogonal ranges, and the best lower bound is .

The -approximation is a rather restricted data structure, as we are not allowed to store any information other than the coordinates of the points in . In this paper, we explore what can be achieved without any restriction on the data structure. We first describe a simple data structure that uses bits and answers queries with error . We then prove a lower bound that any data structure that answers queries with error must use bits. Our lower bound is information-theoretic: We show that there is a collection of point sets with large union combinatorial discrepancy, and thus are hard to distinguish unless we use bits.

## 1 Introduction

Range counting is one of the most fundamental problems in computational geometry and data structures. Given points in dimensions, the goal is to preprocess the points into a data structure, such that the number of points in any query range can be returned. Range counting has been studied intensively, and a lot of work has focused on the space-query time tradeoff or the update-query tradeoff of the data structure. We refer the reader to the survey by Agarwal and Erickson  for these results. In this paper, we look at the problem from a data summarization/compression point of view: What is the minimum amount of space that is needed to encode all the range counts approximately? Approximation is necessary here, since otherwise we will have to remember the entire the point set. It is also easy to see that relative approximation will not help either, as it requires us to differentiate between empty ranges and those containing only one point. Thus, we aim at an absolute error guarantee. As we will be dealing with bit-level space complexity, it is convenient to focus on an integer grid. More formally, we are given a set of points drawn from an grid and an error parameter . The goal is to build a data structure, such that for any orthogonal range , the data structure can return the number of points in with additive error .

We should mention that there is another notion of approximate range counting that approximates the range, i.e., points near the boundary of the range may or may not be counted . Such an approximation notion clearly precludes any sublinear-space data structure as well.

### 1.1 Background and related results

#### ε-approximations.

Summarizing point sets while preserving range counts (approximately) is a fundamental problem with applications in numerical integration, statistics, and data mining, among many others. The classical solution is to use the -approximation from discrepancy theory. Consider a range space , where is a finite point set of size . A subset is called an -approximation of if

 maxR∈R∣∣∣|R∩A||A|−|R∩P||P|∣∣∣≤ε.

This means that we can approximate by counting the number of points in and scaling back, with error at most .

Finding -approximations of small size for various geometric range spaces has been a central research topic in computational geometry. Please see the books by Matousek  and Chazelle  for a comprehensive coverage on this topic. Here we only review the most relevant results, i.e., when the range space is the set of all orthogonal rectangles in dimensions, which we denote as . This question dates back to Beck , who showed that there are -approximations of size for any point set . This was later improved to by Srinivasan . These were not constructive due to the use of a non-constructive coloring with combinatorial discrepancy for orthogonal rectangles. Recently, Bansal  and Lovett et al.  proposed algorithms to construct such a coloring, and therefore has made these results constructive. On the lower bound side, it is known that there are point sets that require -approximations of size .

#### Combinatorial discrepancy.

Given a range space and a coloring function , we define the discrepancy of a range under to be

 χ(P∩R)=∑p∈P∩Rχ(p).

The discrepancy of the range space is defined as

 disc(P,R)=minχmaxR∈R|χ(P∩R)|,

namely, we are looking at the coloring that minimizes the color difference of any range in . This kind of discrepancy is called combinatorial discrepancy or sometimes red-blue discrepancy. Taking the maximum over all point sets of size , we say that the combinatorial discrepancy of is .

There is a close relationship between combinatorial discrepancy and -approximations, as observed by Beck . For orthogonal ranges, the relationship is particularly simple: The combinatorial discrepancy is at most if and only if there is an -approximation of size . In fact, all the aforementioned results on -approximations follow from the corresponding results on combinatorial discrepancy. So the current upper bound on the combinatorial discrepancy of is . The lower bound is , which follows from the Lebesgue discrepancy lower bound (see below). Closing the gap between the upper and the lower bound remains a major open problem in discrepancy theory. For orthogonal ranges in dimensions, the current best upper bound is by Larsen , while the lower bound is , which is recently proved by Matoušek and Nikolov .

#### Lebesgue discrepancy.

Suppose the points of are in the unit square . The Lebesgue discrepancy of is defined to be

 D(P,R)=supR∈R∣∣|P∩R|−∣∣R∩[0,1)2∣∣∣∣.

The Lebesgue discrepancy describes how uniformly the point set is distributed in . Taking the infimum over all point sets of size , we say that the Lebesgue discrepancy of is .

The Lebesgue discrepancy for is known to be . The lower bound is due to Schmidt , while there are many point sets (e.g., the Van der Corput sets  and the -ary nets ) that are proved to have Lebesgue discrepancy. It is well known that the combinatorial discrepancy of a range space cannot be lower than its Lebesgue discrepancy, so this also gives the lower bound on the combinatorial discrepancy of mentioned above.

#### ε-nets.

For a range space , a subset is called an -net of if for any range that satisfies , there is at least point in . Note that an -approximation is an -net, but the converse may not be true.

For a range space , Haussler and Welzl  show that if the range space has finite VC-dimension , there exists an -net of size . For , the current best construction is due to Aronv, Ezra and Sharir , which has size . A recent result by Pach and Tardos  shows that this bound is essentially optimal. For more results on -nets, please refer to the book by Matoušek . In this paper, our data structure will be based an -net for .

#### Approximate range counting data structures.

The -approximation is a rather restricted data structure, as we are not allowed to store any information other than the coordinates of a subset of points in . In this paper, we explore what can be achieved without any restriction on the data structure. In 1 dimension, there is nothing better: An -approximation has size , which takes bits. On the other hand, simply consider the case where the points are divided into groups of size , where all points in each group have the same location. There are such point sets and the data structure has to differentiate all of them. Thus is a lower bound on the number of bits used by the data structure.

Finally, we remark that there are also other work on approximate range counting with various error measure, such as relative -approximation , relative error data structure [1, 4], and absolute error model . These error measures are different from ours, and it is not clear if these problems admit sublinear space solutions.

### 1.2 Our results

This paper settle the following problem: How many bits do we need to encode all the orthogonal range counts with additive error for a point set on the plane? We first show that if we are allowed to store extra information other than the coordinates of the points, then there is a data structure that uses bits. This is a improvement from -approximations.

The majority of the paper is the proof of a matching lower bound: We show that for for some constant , any data structure that answers queries with error must use bits. In particular, if we set , then any data structure that answers queries with error must use bits, which implies that that answering queries with error is as hard as answering the queries exactly.

The core of our lower bound proof is the construction of a collection of point sets with large union combinatorial discrepancy. More precisely, we show that the union of any two point sets in has high combinatorial discrepancy, i.e., at least . Then, for any two point sets , if , that means for any coloring on , there must exist a rectangle such that . Consider the coloring where if and if . Then there exists a rectangle such that . This implies that a data structure that answers queries with error have to distinguish and . Thus, to distinguish all the point sets in , the data structure has to use at least bits, which is a tight lower bound for . We will show how the combinatorial discrepancy bound implies tight lower bound for arbitrary in Section 3.

While point sets with low Lebesgue discrepancy or high combinatorial discrepancy have been extensively studied, here we have constructed a large collection of point sets in which the pairwise union has high combinatorial discrepancy. This particular aspect appears to be novel, and our construction could be useful in proving other space lower bounds. It may also have applications in situations where we need a “diverse” collection of (pseudo) random point sets.

## 2 Upper Bound

In this section, we build a data structure that supports approximate range counting queries. Given a set of points on an grid, our data structure uses bits and answers an orthogonal range counting query with error . We note that it is sufficient to only consider two-sided ranges, since an -sided range counting query can be expressed as a linear combination of four two-sided range counting queries by the inclusion-exclusion principle. A two-sided range is specified by a rectangle of the form , where is called the query point.

#### The data structure.

Our data structure is an approximate variant of Chazelle’s linear-space version of the range tree, originally for exact orthogonal range counting . Consider a set of points on an grid. We divide into the left point set and the right point set by the median of the -coordinates. We will recursively build a data structure for and . Let be a parameter to be determined later. Let denote the quantiles of the -coordinates of . Note that the -th quantile is the -coordinate in with exactly points below it. We use indices to represent , where denote the -th quantile. We don’t explicitly store the -values or even the indices of . Instead, for each index in with coordinate , we store a pointer to the successor of in . Note that these pointers form a monotone increasing sequence of indices in , and can be encoded in bits. Similarly, we store the successor pointers from to with bits. It follows that the space in bits satisfies recursion , with base case . The recurrence solves to . Finally, we explicitly store the quantiles for the -coordinates of with bits.

Given a query . For simplicity, we assume is in . If not, we can use the successor of in as an estimation with additive error at most to the final count. If is in , we follow the pointer to find the successor of in , and the recurse the problem in . If is in , we first follow the pointer to get the successor of in . This gives an approximate count for with additive error . We then follow the pointer to get the successor of in , and recurse the problem in . Note that rounding with the successor in or causes additive error , and using the approximate count for also causes additive error . Thus, the overall additive error satisfies , with base case . The recurrence solves to , and we can then set to make . It follows that , and thus total space usage is bits. The query time can also be made , if we use succinct rank-select structures to encode the pointers, as in Chazelle’s method.

###### Theorem 2.1.

Given a set of points drawn from an grid, there is a data structure that uses bits and answers orthogonal range counting query with additive error .

## 3 Lower Bound

In this section, we prove a lower bound that matches the upper bound in Theorem 2.1.

###### Theorem 3.1.

Consider a set of points drawn from an grid. A data structure that answers orthogonal range counting query with additive error for any point set must use bits.

To prove Theorem 3.1, we need the following theorem on union discrepancy.

###### Theorem 3.2.

Let denote the collection of all -point sets drawn from an grid. There exists a constant and a sub-collection of size , such that for any two point sets , their union discrepancy .

We first show how Theorem 3.2 implies Theorem 3.1.

###### of Theorem 3.1.

We only need to prove the lower bound. Suppose we group the points into fat points, each of size . By Theorem 3.2, there is a collection of fat point sets, such that for any two fat point sets , there exists a rectangle such that the number of fat points in and differs by at least . Since each fat points corresponds to points, it follows that the counts of and differs by at least

 εnlog1ε⋅clogN=εnlog1ε⋅clog(1εlog1ε)≥cεn.

Therefore, a data structure that answers queries with error have to distinguish and . Thus, to distinguish all the point sets in , the data structure has to use at least bits. ∎

In the rest of this section, we will focus on proving Theorem 3.2. To derive the sub-collection in Theorem 3.2, we begin by looking into a collection of point sets called binary nets. Binary nets are a special type of point sets under a more general concept called -nets, which are introduced in  as an example of point sets with low Lebesgue discrepancy. See the survey by Clayman et al.  or the book by Hellekalek et al.  for more results on -nets. In this paper we will show that binary nets have two other nice properties: 1) A binary net has high combinatorial discrepancy, i.e., ; 2) there is a bit vector representation for every binary net, which allows us to extract a sub-collection by constructing a subset of bit vectors. In the following sections, we will define binary nets, and formalize these two properties.

### 3.1 Definitions

For ease of the presentation, we assume that the grid is embedded in the unit square . We partition into squares, each of size . We assume the grid points are placed at the mass centers of the squares, that is, each grid point has coordinates , for , where denote the set of all integers in . For the sake of simplicity, we define the grid point to be the grid point with coordinates , and we do not distinguish a grid point and the square it resides in.

Now we introduce the concepts of -cell and -canonical cell.

###### Definition 3.1.

A -cell at position is the rectangle . We use to denote the -cell at position , and to denote the set of all -cells.

###### Definition 3.2.

A -canonical cell at position is a -cell with coordinates . We use , to denote the -canonical cell at position , and to denote the set of all -canonical cells.

Figure 1 is the illustration of -cells and canonical cells. Note that the position for a -cell takes value in . In particular, we call the -th column and the -th row. Note that for a fixed , partitions the grid into rectangles. Based on the definition of -canonical cells, we define the binary nets:

###### Definition 3.3.

A point set is called a binary net if for any , has exactly one point in each -canonical cell.

Let denote the collection of binary nets. In other word, is the set

 {P∣|P∩Gk(i,j)|=1,k∈[logn],i∈[n/2k],j∈[2k]}.

It is known that the point sets in have Lebesgue discrepancy ; below we show that they also have combinatorial discrepancy. However, the union of two point sets in could have combinatorial discrepancy as low as . Thus we need to carefully extract a subset from with high pairwise union discrepancy.

### 3.2 Combinatorial Discrepancy and Corner Volume

In this section, we focus on proving the following theorem, which shows that the combinatorial discrepancy of a binary net is large.

###### Theorem 3.3.

For any point set , we have .

Strictly speaking, Theorem 3.2 does not depend on Theorem 3.3, but this theorem gives us some insights on the binary nets. Moreover, a key lemma to proving Theorem 3.2 (Lemma 3.3) shares essentially the same proof with Theorem 3.3. To prove Theorem 3.3, we need the following definition of corner volume:

###### Definition 3.4.

Consider a point set and a -canonical cell . Let be the point of in . We define the corner volume to be the volume of the orthogonal rectangle defined by and its nearest corner of . We use to denote the summation of the corner volumes over all possible triples , that is,

 SP=logn∑k=0n/2k−1∑i=02k−1∑j=0VP(k,i,j).

See Figure 2 for the illustration of corner volumes. A key insight of our lower bound proof is the following lemma, which relates the combinatorial discrepancy of with its corner volume sum .

###### Lemma 3.1.

There exists a constant , such that for any point set with corner volume sum

 SP≥clogn,

we have .

The proof of Lemma 3.1 makes use of the Roth’s orthogonal function method , which is widely used for proving lower bounds for Lebesgue discrepancy (see [9, 19]).

###### Proof.

Consider a binary net that satisfies , where is constant to be determined later. Given any coloring and a point , the combinatorial discrepancy at a point is defined to be

 D(x)=∑p∈P∩[0,x1)×[0,x2)χ(p).

If we can prove , the lemma will follow.

For , we define normalized wavelet functions as follow: for each -canonical cell , let denote the point contained in it. We subdivide into four equal-size quadrants, and use , , , to denote the upper right, upper left, lower right and lower left quadrants, respectively (See Figure 2). Set over quadrants and , and over the other two quadrants. To truly reveal the power of these wavelet functions, we define a more general class of functions called checkered functions.

###### Definition 3.1.

We say a function is -checkered if for each -cell, there exists a color such that is equal to over and and over the other two quadrants.

Note that our definition of checkered function is slight different from the one used in . It is easy to see the wavelet function is -checkered, and the integration of a -checkered function over an -cell is . The following lemma states that the checkered property is “closed” under multiplication.

###### Fact 3.1.

If is -checkered and is checkered, where and , then is -checkered.

For a proof, consider an -cell . We observe that this cell is defined by the intersection of an -cell and an -cell, and we use and to denote these two cells, respectively. Therefore the four quadrants of are defined by the intersections of two neighboring quadrants of and two neighboring quadrants of . Without loss of generality, we assume the four quadrants are defined by the intersections of the two upper quadrants of and two left quadrants of (see Figure 3). Since is -checkered and is checkered, we can assume equal to and over and , and equal to and over and , respectively. It follows that the is equal to over and , and over and . Thus is an -checkered function.

A direct corollary from Fact 3.1 is that the wavelet functions are generalized orthogonal:

###### Corollary 3.1.

For , the function is a -checkered. As a consequence, we have

 ∫[0,1)2fk1(x)⋯fkl(x)dx=0.

In the remaining of the paper we assume the range of the integration is and the variable of integration is when not specified. We define the Riesz product

 G(x)=−1+logn∏k=0(γfk(x)+1),

where is some constant to be determined later. By the inequality

 ∣∣∣∫GD∣∣∣≤∫|GD|≤supx∈[0,1)2|D|⋅∫|G|,

we can lower-bound the combinatorial discrepancy of as follows:

 supx∈[0,1)2|D|≥∣∣∣∫GD∣∣∣/∫|G|. (3.1)

For the denominator , we have

 ∫|G| =∫∣∣ ∣∣−1+logn∏k=0(γfk+1)∣∣ ∣∣≤1+logn∑l=0γl∑0≤k1<…

The last equation is due to Corollary 3.1. The numerator can be expressed as follow:

 ∣∣∣∫GD∣∣∣= ∣∣ ∣∣∫(−1+logn∏k=0(γfk+1))⋅D∣∣ ∣∣ = ∣∣ ∣∣∫⎛⎝γlogn∑k=0fk+logn∑l=2γl∑0≤k1<…

In order to estimate , we consider the integration of a single product over a -canonical cell . Recall that there is exactly one point of that lies in . We use to denote this point in , and denote its color. Define horizontal vector and vertical vector . Then for any point , points , and are the analogous points in quadrants , and of , respectively (see Figure 2). The four analogous points defines an orthogonal rectangle. We use to denote the orthogonal rectangle, and function to denote the indicator function of point and , that is, if and if otherwise. We can express the integral as

 ∫Gk(i,j)fk(x)D(x)dx= ∫Gk(i,j)LLχ(q)(D(x)−D(x+u)−D(x+v)+D(x+u+v))dx = ∫Gk(i,j)LLχ(q)⋅χ(q)R(x)dx=∫Gk(i,j)LLR(x)dx.

The second equation is because only counts points inside , which can only be , or nothing otherwise. Observe that if and only if one of ’s analogous points lies inside the rectangle defined by and its nearest corner (see Figure 2), so we have

 ∫Gk(i,j)fkD= ∫Gk(i,j)LLR=VP(k,i,j). (3.4)

Now we can compute the first term in (3.3):

 γ∣∣ ∣∣logn∑k=0∫fkD∣∣ ∣∣ =γ∣∣ ∣∣logn∑k=0n/2k−1∑i=02k−1∑j=0∫Gk(i,j)fkD∣∣ ∣∣=γ∣∣ ∣∣logn∑k=0n/2k−1∑i=02k−1∑j=0VP(k,i,j)∣∣ ∣∣ =γSP≥cγlogn. (3.5)

For the second term in (3.3), consider a -cell . Note that intersects with at most point. By Fact 3.1, function is -checkered, so following similar arguments in the proof of equation (3.4), we can show that the integral is if and otherwise equal to the corner volume of . In the latter case, we can relax the corner volume to the volume of , that is, . Thus we can estimate the integral as follows:

 ∣∣ ∣∣∫Gk1,logn−kl(i,j)fk1⋯fklD∣∣ ∣∣≤12kl−k1n.

Since there are non-empty -cells, we have

 ∣∣∣∫fk1⋯fklD∣∣∣≤n⋅12kl−k1n=12kl−k1.

Now we can estimate the second term in (3.3):

 logn∑l=2γl∣∣ ∣∣∑0≤k1<…

For the last equation we replace with a new index and use the fact that there are ways to choose in an interval of length . Note that for a fixed , there are possible values for , so

 logn∑l=2γllogn+1∑w=l−1∑kl−k1=w12w(w−1l−2)= logn∑l=2γllogn+1∑w=l−1logn+1−w2w(w−1l−2) ≤ logn∑l=2γllogn+1∑w=l−1logn2w(w−1l−2) = lognlogn∑l=2γllogn+1∑w=l−112w(w−1l−2). (3.7)

By inverting the order of the summation,

 lognlogn∑l=2γllogn+1∑w=l−1∑kl−k1=w12w(w−1l−2)= γ2lognlogn+1∑w=112ww+1∑l=2(w−1l−2)γl−2 = γ2lognlogn+1∑w=112w(1+γ)w−1 = 2γ2lognlogn+1∑w=1(1+γ2)w−1≤2γ21−γlogn. (3.8)

So from (3.5), (3.6), (3.7) and (3.8) we have

 ∣∣∣∫GD∣∣∣≥cγlogn−2γ21−γlogn.

Setting small enough while combining with (3.1) and (3.2) completes the proof. ∎

Now we can give a proof to Theorem 3.3. By Lemma 3.1, we only need to show that the corner volume sum of any point set is large. Fix and consider a -canonical cell . Let denote the point in . We define the corner -distance of to be the difference between the -coordinate of and that of its nearest corner of . The corner -distance is defined in similar manner. See Figure 2. We use and to denote the corner -distance and corner -distance, respectively. Note that the corner volume is the product of and . The following fact holds for the -distances of canonical cells in a column:

###### Fact 3.2.

Fix and , we have , where both are taken as multisets.

For a proof, note that the -canonical cell is intersecting with columns: . There are points in , and they must reside in different columns. Therefore there is exactly one point in the each of the columns, and their corner -distances span from to , and each value is hit exactly twice. Similarly, we have

###### Fact 3.3.

Fix and , we have where both are taken as multisets.

Now consider the product of and over all for a fixed :

 n/2k−1∏i=02k−1∏j=0VP(k,i,j)= n/2k−1∏i=02k−1∏j=0X(k,i,j)Y(k,i,j) = n/2k−1∏i=