Shallow Packings in Geometry
We refine the bound on the packing number, originally shown by Haussler, for shallow geometric set systems. Specifically, let be a finite set system defined over an -point set ; we view as a set of indicator vectors over the -dimensional unit cube. A -separated set of is a subcollection , s.t. the Hamming distance between each pair is greater than , where is an integer parameter. The -packing number is then defined as the cardinality of the largest -separated subcollection of . Haussler showed an asymptotically tight bound of on the -packing number if has VC-dimension (or primal shatter dimension) . We refine this bound for the scenario where, for any subset, of size and for any parameter , the number of vectors of length at most in the restriction of to is only , for a fixed integer and a real parameter (this generalizes the standard notion of bounded primal shatter dimension when ). In this case when is “-shallow” (all vector lengths are at most ), we show that its -packing number is , matching Haussler’s bound for the special cases where or . As an immediate consequence we conclude that set systems of halfspaces, balls, and parallel slabs defined over points in -space admit better packing numbers when is smaller than . Last but not least, we describe applications to (i) spanning trees of low total crossing number, and (ii) geometric discrepancy, based on previous work by the author.
Let be a set system defined over an -point set . We follow the notation in , and view as a set of indicator vectors in , that is, . Given a subsequence of indices (coordinates) , , , the projection of onto (also referred to as the restriction of to ) is defined as
With a slight abuse of notation we write to state the fact that is a subsequence of indices as above. We now recall the definition of the primal shatter function of :
The primal shatter function of is a function, denoted by , whose value at is defined by . In other words, is the maximum possible number of distinct vectors of when projected onto a subsequence of indices.
From now on we say that has primal shatter dimension if , for all , where and are constants. A notion closely related to the primal shatter dimension is that of the VC-dimension:
An index sequence is shattered by if . The VC-dimension of , denoted by is the size of the longest sequence shattered by . That is, .
The notions of primal shatter dimension and VC-dimension are interrelated. By the Sauer-Shelah Lemma (see [33, 35] and the discussion below) the VC-dimension of a set system always bounds its primal shatter dimension, that is, . On the other hand, when the primal shatter dimension is bounded by , the VC-dimension does not exceed (which is straightforward by definition, see, e.g., ).
A typical family of set systems that arise in geometry with bounded primal shatter (resp., VC-) dimension consists of set systems defined over points in some low-dimensional space , where represents a collection of certain simply-shaped regions, e.g., halfspaces, balls, or simplices in . In such cases, the primal shatter (and VC-) dimension is a function of ; see, e.g.,  for more details. When we flip the roles of points and regions, we obtain the so-called dual set systems (where we refer to the former as primal set systems). In this case, the ground set is a collection of algebraic surfaces in , and corresponds to faces of all dimensions in the arrangement of , that is, this is the decomposition of into connected open cells of dimensions induced by . Each cell is a maximal connected region that is contained in the intersection of a fixed number of the surfaces and avoids all other surfaces; in particular, the -dimensional cells of are called “vertices”, and -dimensional cells are simply referred to as “cells”; see  for more details. The distinction between primal and dual set systems in geometry is essential, and set systems of both kinds appear in numerous geometric applications, see, once again  and the references therein.
The length of a vector under the norm is defined as , where is the th coordinate of , . The distance between a pair of vectors is defined as the norm of the difference , that is, . In other words, it is the symmetric difference distance between the corresponding sets represented by , .
Let be an integer parameter. We say that a subset of vectors is -separated if for each pair , . The -packing number for , denote it by , is then defined as the cardinality of the largest -separated subset . A key property, originally shown by Haussler  (see also [8, 9, 11, 29, 37]), is that set systems of bounded primal shatter dimension admit small -packing numbers. That is:
Let be a set of indicator vectors of primal shatter dimension , and let be an integer parameter. Then , where the constant of proportionality depends on .
We note that in the original formulation in  the assumption is that the set system has a finite VC-dimension. However, its formulation in , which is based on a simplification of the analysis of Haussler by Chazelle , relies on the assumption that the primal shatter dimension is , which is the actual bound that we state in Theorem 1.3. We also comment that a closer inspection of the analysis in  shows that this assumption can be replaced with that of having bounded primal shatter dimension (independent of the analysis in ). We describe these considerations in Section 2.
Previous work. In his seminal work, Dudley  presented the first application of chaining, a proof technique due to Kolmogorov, to empirical process theory, where he showed the bound on , with a constant of proportionality depending on the VC-dimension (see also previous work by Haussler  and Pollard  for an alternative proof and a specification of the constant of proportionality). This bound was later improved by Haussler , who showed (see also Theorem 1.3), and presented a matching lower bound, which leaves only a constant factor gap, which depends exponentially in . In fact, the aforementioned bounds are more general, and can also be applied to classes of real-valued functions of finite “pseudo-dimension” (the special case of set systems corresponds to Boolean functions), see, e.g., , however, we do not discuss this generalization in this paper and focus merely on set systems of finite primal shatter (resp., VC-) dimension.
The bound of Haussler  (Theorem 1.3) is in fact a generalization of the so-called Sauer-Shelah Lemma [33, 35], asserting that , where is the base of the natural logarithm, and thus this bound is . Indeed, when , the corresponding -separated set should include all vectors in , and then the bound of Haussler  becomes , matching the Sauer-Shelah bound up to a constant factor that depends on .
There have been several studies extending Haussler’s bound or improving it in some special scenarios. We name only a few of them. Gottlieb et al.  presented a sharpening of this bound when is relatively large, i.e., is close to , in which case the vectors are “nearly orthogonal”. They also presented a tighter lower bound, which considerably simplifies the analysis of Bshouty et al. , who achieved the same tightening.
A major application of packing is in obtaining improved bounds on the sample complexity in machine learning. This was studied by Li et al.  (see also ), who presented an asymptotically tight bound on the sample complexity, in order to guarantee a small “relative error.” This problem has been revisited by Har-Peled and Sharir  in the context of geometric set systems, where they referred to a sample of the above kind as a “relative approximation” (discussed in Appendix C), and showed how to integrate it into an approximate range counting machinery, which is a central application in computational geometry. The packing number has also been used by Welzl  in order to construct spanning trees of low crossing number (see also ) and by Matoušek [28, 29] in order to obtain asymptotically tight bounds in geometric discrepancy. We discuss these applications in the context of the problem studied in this paper in Section 4.
In the sequel, we refine the bound in the Packing Lemma (Theorem 1.3) so that it becomes sensitive to the length of the vectors , based on an appropriate refinement of the underlying primal shatter function. This refinement has several geometric realizations. Our ultimate goal is to show that when the set system is “shallow” (that is, the underlying vectors are short), the packing number becomes much smaller than the bound in Theorem 1.3.
Nevertheless, we cannot always enforce such an improvement, as in some settings the worst-case asymptotic bound on the packing number is even when the set system is shallow. We demonstrate such a scenario by considering dual set systems of axis-parallel rectangles and points in the plane, where one can have a large subcollection that is both -separated and -shallow. In this case , which is not any better than the “standard” bound (stated in Theorem 1.3) obtained without the shallowness assumption. See Figure 1 and Appendix A, where we give a more detailed description of this construction to the non-expert reader.
Therefore, in order to obtain an improvement on the packing number of shallow set systems,
we may need further assumptions on the primal shatter function.
Such assumptions stem from the random sampling technique of Clarkson and Shor ,
which we define as follows.
Let be our set system.
We assume that for any sequence of indices,
and for any parameter , the number of vectors in
of length at most is only , where is the primal shatter dimension and
is a real parameter.
Let us now denote by the -packing number of , where the vector length of each element in is at most , for some integer parameter . By these assumptions, we can assume, without loss of generality, that , as otherwise the distance between any two elements in must be strictly less than , in which case the packing is empty. We also assume that (where is the VC-dim), as otherwise the bound on the packing number is a constant that depends on and by the Packing Lemma (Theorem 1.3). The choice of this threshold is justified in Section 3 where we present the analysis and show our main result, which we state below:
Theorem 1.4 (Shallow Packing Lemma).
Let be a set of indicator vectors, whose primal shatter function has a Clarkson-Shor property, and whose VC-dim is . Let be an integer parameter between and , an integer parameter between and , and suppose that . Then:
where the constant of proportionality depends on .
This problem has initially been addressed by the author in  as a major tool to obtain size-sensitive discrepancy bounds in set systems of this kind, where it has been shown . The analysis in  is a refinement over the technique of Dudley  combined with the existence of small-size relative approximations (see  and Appendix C for more details). In the current analysis we completely remove the extra factor appearing in the previous bound. In particular, when (where we just have the original assumption on the primal shatter function) or (in which case each vector in has an arbitrary length), our bound matches the tight bound of Haussler, and thus appears as a generalization of the Packing Lemma (when replacing VC-dimension by primal shatter dimension).
Next, in Section 4.2 we present an application of Theorem 1.4 to “spanning trees with low total conflict number”, which is based on the machinery of Welzl  to construct spanning trees of low crossing number (see also ). Here the tree spans (representing, say, a set of regions defined over points in -space), and the “conflict number” of an edge is the symmetric difference distance between and . Based on this structure we introduce a general framework to efficiently compute various measures arising in geometric optimization (e.g., diameter, width, radius of the smallest enclosing ball, volume of the minimum bounding box, etc.) in each region represented by , where the key idea is to keep the overall number of updates small (given a spanning tree of the above kind). In Section 4.3 we show an application in geometric discrepancy, where the goal is to obtain discrepancy bounds that are sensitive to the length of the vectors in . Due to the bound in Theorem 1.4 we obtain an improvement over the one presented in .
Beyond the geometric applications, this paper is primarily an extension of Haussler’s technique  to shallow set systems. We note that whereas the analysis of Dudley  is fairly simple and intuitive, the analysis of Haussler  is much more intricate, and thus the initial effort in this study was to understand Haussler’s analysis, whose conclusions are summarized in Appendix B. We are also aware of the simplification to Haussler’s proof by Chazelle , nevertheless, we had to use the observations made in  in order to proceed with our analysis. Our main conclusion about the analysis in  is given in Inequality (1), which implies that the cardinality of a -separated set is bounded (up to a factor of ) by the expected number of vectors in the projection of onto a random sample of indices, where is the VC-dimension. Although this simple observation is not explicitly stated in , this relation is a key property used in our analysis.
Overview of Haussler’s Approach.
For the sake of completeness, we repeat some of the details in the analysis of Haussler  and use similar notation for ease of presentation.
Let be a collection of indicator vectors of bounded primal shatter dimension , and denote its VC-dimension by . By the discussion above, . From now on we assume that is -separated, and thus a bound on is also a bound on the packing number of . The analysis in  exploits the method of “conditional variance” in order to conclude
where is the expected size of when projected onto a subset of indices chosen uniformly at random without replacements from , and
We justify this choice in Appendix B, as well as the facts that and consists of precisely indices.
For the sake of completeness, we review Haussler approach in Appendix B, and also emphasize some of the properties there,
which are fundamental in our view. Moreover, we refine Haussler’s analysis to include two natural extensions:
(i) Obtain a refined bound on :
This extension is a direct consequence of Inequality (1).
In the analysis of Haussler is replaced by its upper bound , resulting from the fact that
the primal shatter dimension of (and thus of ) is , from which we obtain that for any choice
of , , with a constant of proportionality that depends on , and thus the packing
number is , as asserted in Theorem 1.3.
3 The Analysis: Refining Haussler’s Approach
Overview of the approach. We next present the proof of Theorem 1.4. In what follows, we assume that is -separated. We first recall the assumption that the primal shatter function of has a Clarkson-Shor property, and that the length of each vector under the norm is most . This implies that consists of at most vectors.
Since the Clarkson-Shor property is hereditary, then this also applies to any projection of onto a subset of indices, implying that the bound on is at most , where is a subset of indices as above. However, due to our sampling scheme we expect that the length of each vector in should be much smaller than , (e.g., in expectation this value should not exceed ), from which we may conclude that the actual bound on is smaller than the trivial bound . Ideally, we would like to show that this bound is , which matches our asymptotic bound in Theorem 1.4 (recall that ). However, this is likely to happen only in case where the length of each vector in does not exceed its expected value, or that there are only a few vectors whose length deviates from its expected value by far, whereas, in the worst case there might be many leftover “long” vectors in . Nevertheless, our goal is to show that, with some carefulness one can proceed in iterations, where initially is a slightly larger sample, and then at each iteration we reduce its size, until eventually it becomes and we remain with only a few long vectors. At each such iteration is a random structure that depends on the choice of and may thus contain long vectors, however, in expectation they will be scarce!
Specifically, we proceed over at most iterations, where we perform local improvements over the bound on , as follows. Let be the bound on after the th iteration is completed, . We first show in Corollary 3.2 that for the first iteration, , with a constant of proportionality that depends on . Then, at each further iteration , we select a set of indices uniformly at random without replacements from (see (3) for the bound on ). Our goal is to bound using the bound , obtained at the previous iteration, which, we assume by induction to be (where the base case is shown in Corollary 3.2).
A key property in the analysis is then to show that the probability that the length of a vector (after the projection of onto ) deviates from its expectation decays exponentially (Lemma 3.3). Note that in our case this expectation is at most . This, in particular, enables us to claim that in expectation the overall majority of the vectors in have length at most , whereas the remaining longer vectors are scarce. Specifically, since the Clarkson-Shor property is hereditary, we apply it to and conclude that the number of its vectors of length at most is only , with a constant of proportionality that depends on . On the other hand, due to Lemma 3.3 and our inductive hypothesis, the number of longer vectors does not exceed , which is dominated by the first bound. We thus conclude . Then we apply Inequality (4) in order to complete the inductive step, whence we obtain the bound on , and thus on . These properties are described more rigorously in Lemma 3.4, where derive a recursive inequality for using the bound on . We emphasize the fact that the sample is always chosen from the original ground set , and thus, at each iteration we construct a new sample from scratch, and then exploit our observation in (4).
3.1 The First Iteration
In order to show our bound on , we form
a subset of indices
A sample as above satisfies properties (i)–(ii), with probability at least .
We next apply Lemma 3.1 in order to bound . We first recall that the Clarkson-Shor property of the primal shatter function of is hereditary. Incorporating the bound on and property (ii), we conclude that
with a constant of proportionality that depends on . Now, due to property (i), , we thus conclude:
After the first iteration we have: , with a constant of proportionality that depends on .
Remark: We note that the preliminary bound given in Corollary 3.2 is crucial for the analysis, as it constitutes the base for the iterative process described in Section 3.2. In fact, this step of the analysis alone bypasses our refinement to Haussler’s approach, and instead exploits the approach of Dudley .
3.2 The Subsequent Iterations: Applying the Inductive Step
Let us now fix an iteration . As noted above, we assume by induction on that the bound on after the th iteration is . Let be a subset of indices, chosen uniformly at random without replacements from , with given by (3). Let , and denote by its projection onto . The expected length of is at most . We next show (see Appendix D for the proof):
Lemma 3.3 (Exponential Decay Lemma).
where is a real parameter and is the base of the natural logarithm.
where is a sufficiently large constant, and is another constant whose choice depends on and , and can be made arbitrarily large. Since we obviously have . We next show:
Under the assumption that , we have, at any iteration :
where is the bound on after the th iteration, and is a constant that depends on (and ) and the constant of proportionality determined by the Clarkson-Shor property of .
In order to obtain the first term in the bound on , we consider all vectors of length at most (where is a sufficiently large constant as above) in the projection of onto a subset of indices (in this part of the analysis can be arbitrary). Since the primal shatter function of has a Clarkson-Shor property, which is hereditary, we obtain at most
vectors in of length smaller than . It is easy to verify that the constant of proportionality in the bound just obtained depends on , , and the constant of proportionality determined by the Clarkson-Shor property of .
Next, in order to obtain the second term, we consider the vectors that are mapped to vectors with . By Inequality (5):
and recall that is the bound on after the previous iteration . This completes the proof of the lemma.
Remark: We note that the bound on consists of the worst-case bound on the number of short vectors of length at most , obtained by the Clarkson-Shor property, plus the expected number of long vectors.
Wrapping up. We now complete the analysis and solve Inequality (6).
Our initial assumption that , and the fact that is sufficiently large,
imply that the coefficient of the recursive term is smaller than , for any .
for any .
We thus conclude . In particular, at the termination of the last iteration , we obtain:
with a constant of proportionality that depends on (and ). This at last completes the proof of Theorem 1.4.
4.1 Realization to Geometric Set Systems
We now incorporate the random sampling technique of Clarkson and Shor  with Theorem 1.4
in order to conclude that small shallow packings exist in several useful scenarios.
This includes the case where represents:
(i) A collection of halfspaces defined over an -point set in -space.
In this case, for any integer parameter , the number of halfspaces that contain at most
points is , and thus the primal shatter function has a Clarkson-Shor property.
(ii) A collection of balls defined over an -point set in -space.
Here, the number of balls that contain at most points is , and therefore
the primal shatter function has a Clarkson-Shor property.
(iii) A collection of parallel slabs (that is, each of these regions is enclosed between two parallel hyperplanes and has an arbitrary width),
defined over an -point set in -space.
The number of slabs, which contains at most points is .
(iv) A dual set system of points in -space and a collection of -variate (not necessarily continuous or totally defined)
functions of constant description complexity. Specifically, the graph of each function is a semi-algebraic set in defined by
a constant number of polynomial equalities and inequalities of constant maximum degree (see [34, Chapter 7] for a detailed description
of these properties, which we omit here).
Let be a set of indicator vectors representing a set system of halfspaces defined over an -point set in -space, and let be two integer parameters as in Theorem 1.4. Then:
where the constant of proportionality depends on .
Let be a set of indicator vectors representing a set system of balls defined over an -point set in -space, and let be two integer parameters as in Theorem 1.4. Then:
where the constant of proportionality depends on .
Let be a set of indicator vectors representing a set system of parallel slabs defined over an -point set in -space, and let be two integer parameters as in Theorem 1.4. Then:
where the constant of proportionality depends on .
Let be a set of indicator vectors representing a dual set system of -variate (not necessarily continuous or totally defined) functions of constant description complexity and points in -space. Let be two integer parameters as in Theorem 1.4. Then:
where the constant of proportionality depends on and on .
4.2 Spanning Trees of Low Total Conflict Number
Suppose we are given a set of points in -space and a set of regions defined over .
With a slight abuse of notation, we also refer to as the corresponding set system defined over .
Let be a graph with vertex set . We say that a point conflicts with an edge if . We then define the conflict number of an edge of as the number of points with which it is in conflict, and then the total conflict number of is the sum of the conflict numbers, over all its edges. Using similar arguments as in [29, Lemma 5.18] and , one can show (we omit the easy proof):
Let be a set system of sets, defined over an -point set . Assume has a Clarkson-Shor property, and that , for each , where is an integer parameter. Then there exists a spanning tree over , whose total conflict number is .
Constructing an approximation of the tree. In order to construct efficiently, we relax this problem to only approximating the spanning tree, and then use the machinery of Har-Peled and Indyk  in order to conclude that when is a set system as in Corollaries 4.1–4.3, a -factor approximation for (that is, a spanning tree whose total conflict number is at most of the smallest such number) can be computed in subquadratic time, for any . We present a sketch of this construction in Appendix E, and conclude:
Let be a set of points in -space and let be a set of regions defined over . Then, for any , one can compute in time
(i) a spanning tree of , where is a set of halfspaces in -space, with overall conflicts.
(ii) , a spanning tree as above, where is a set of balls in -space, with overall conflicts.
(iii) , a spanning tree as above, where is a set of parallel slabs in -space, with overall conflicts.
In the above bounds hides a poly-logarithmic factor.
Based on the above machinery, we next propose a general framework for updating the optimal solution (or an approximate solution) of a prescribed geometric optimization problem, over all regions in .
A general framework. Let , be as in Lemma 4.5, and let be a function that assigns real values on the sets (each being a subset of ). For example, may correspond to diameter, width, radius of the smallest enclosing ball, volume of the minimum bounding box, etc., see, e.g., , where these measures are referred to as “faithful measures”. Our goal is to efficiently compute for each .
Specifically, we assume to have a data-structure that maintains a subset with the following properties: (i) The time to preprocess is . (ii) The time to update (that is, inserting or deleting an element from ) is . (iii) At any given time, querying for the value of , w.r.t. the set of the currently stored points, costs time.
Having this machinery at hand, in a brute-force approach, is initially empty. Then, for each , we insert its elements into , obtain by querying , and then remove all these elements from . We proceed in this manner until all sets are exhausted. Under the assumption that is -shallow, the resulting running time is . On the other hand, with the existence of a -factor approximation for the spanning tree (with properties as in Lemma 4.5), we can proceed as follows. With a slight abuse of notation, we also denote the approximate tree by . Initially, is empty as above, and we choose an arbitrary set , for which we compute as above. Then we traverse from in a BFS manner, update accordingly, and make a query at each vertex tracked during the search. Clearly, the number of these updates is proportional to the overall number of conflicts in , and thus the overall running time is , where is the time to construct (a -factor approximation for) . We are interested in the scenario where this solution outperforms the brute-force algorithm (at least for some values of ). Below we describe a concrete scenario, related to the approximation of the faithful measures listed above, which demonstrates the usefulness of our framework.
Dynamic coresets. Based on the seminal work of Agarwal et al..  on coresets, Chan  presented a data structure, which maintains a constant-size coreset, with respect to “extent”, in update time, for all constant dimensions, with linear space and preprocessing time, where the constant of proportionality depends on the error parameter . Using this machinery, it is straightforward to obtain dynamic -factor approximation algorithms with logarithmic update time for computing the faithful measures stated above. The time to compute an approximation for these measures depends on and the dimension . For simplicity of presentation, we omit the dependency on in the bounds of , , and compare the performance of our approach w.r.t. the brute force computation, when we consider only the parameters , , and .
With this machinery, the brute-force algorithm runs in time, whereas our algorithm runs in time. We now consider the bounds stated in Corollary 4.6 and set , in which case the term becomes nearly-linear and we pay only an extra logarithmic factor in the total number of conflicts. We then conclude:
Let be a set of points in -space and a let be a set of regions defined over , where, for each , , where is an integer parameter. Then one can compute a -factor approximation for the aforementioned faithful measures in time that is the minimum of and
(i) , if is a collection of halfspaces, in -space.
(ii) , if is a collection of balls in -space.
(iii) , if is a collection of parallel slabs in -space.
The constant of proportionality in each of these bounds depends on and .
4.3 Geometric Discrepancy
Given a set system as above, we now wish to color the points of by two colors, such that in each set of the deviation from an even split is as small as possible.
Formally, a two-coloring of is a mapping . For a set we define . The discrepancy of is then defined as .
In a previous work , the author presented size-sensitive discrepancy bounds for set systems of halfspaces defined over points in -space. These bounds were achieved by combining the entropy method  with -packings, and, as observed in , they are optimal up to a poly-logarithmic factor. Incorporating our bound in Theorem 1.4 into the analysis in , the bounds on improve by a factor. Specifically, we obtain (we omit the technical details in this version):
Let be a set system of halfspaces defined over -points in -space ().
Then, there is a two-coloring , such that for each ,
, for even,
and , for odd,
where the constant of proportionality depends on .
When the bound on is .
5 Concluding Remarks and Further Research
We note that Corollary 4.4 implies that one can pack the “shallow level” in an arrangement of