Generalized comparison trees for point-location problems
Let be an arbitrary family of hyper-planes in -dimensions. We show that the point-location problem for can be solved by a linear decision tree that only uses a special type of queries called generalized comparison queries. These queries correspond to hyperplanes that can be written as a linear combination of two hyperplanes from ; in particular, if all hyperplanes in are -sparse then generalized comparisons are -sparse. The depth of the obtained linear decision tree is polynomial in and logarithmic in , which is comparable to previous results in the literature that use general linear queries.
This extends the study of comparison trees from a previous work by the authors [Kane et al., FOCS 2017]. The main benefit is that using generalized comparison queries allows to overcome limitations that apply for the more restricted type of comparison queries.
Our analysis combines a seminal result of Forster regarding sets in isotropic position [Forster, JCSS 2002], the margin-based inference dimension analysis for comparison queries from [Kane et al., FOCS 2017], and compactness arguments.
Let be a family of hyper-planes. partitions into cells. The point-location problem is to decide, given an input point , to which cell it belongs. That is, to compute the function
A well-studied computation model for this problem is a linear decision tree (LDT): this is a ternary decision tree whose input is and its internal nodes make linear/threshold queries of the form for some . The three children of correspond to the three possible outputs of the query : “”,“”,“”. The leaves of the tree are labeled with with correspondence to the cell in the arrangement that contains . The complexity of a linear decision tree is its depth, which corresponds to the maximal number of linear queries made on any input.
A comparison decision tree is a special type of an LDT, in which all queries are of one of two types:
Label query: “” for .
Comparison query: “” for .
In [KLMZ17] it is shown that when is “nice” then there exist comparison decision trees that computed and has nearly optimal depth (up to logarithmic factors). For example, for any there is a comparison decision tree with depth . This is off by a factor from the basic information theoretical lower bound of . Moreover, it is shown there that certain niceness conditions are necessary. Concretely, they give an example of such that any comparison decision tree that computes requires depth . This raises the following natural problem: can comparison decision trees be generalized in a way that allows to handle arbitrary point-location problems?
This paper addresses the above question by considering generalized comparison queries. A generalized comparison query allows to re-weight its terms: namely, it is query of the form
for and some . Note that it may be assumed without loss of generality that . A generalized comparison decision tree, naturally, is a linear decision tree whose internal linear queries are restricted to be generalized comparisons. Note that generalized comparison queries include as special cases both label queries (setting ) and comparison queries (setting ).
Geometrically, generalized comparisons are 1-dimensional in the following sense: let , with then lies on the interval connecting and . If and have different signs, lies on an interval between some other and . So comparison queries are linear queries that lies on the projective lines intervals spanned by . In particular, if each has sparsity at most (namely, at most nonzero coordinates) then each generalized comparison has sparsity at most .
Our main result is:
Theorem 1.1 (Main theorem).
Let . Then there exists a generalized comparison decision tree of depth that computes for every input .
Why consider generalized comparisons?
We consider generalized comparisons for a number of reasons:
The lower bound against comparison queries in [KLMZ17] was achieved by essentially scaling different elements of with exponentially different scales. Allowing for re-scaling (which is what generalized comparisons allow to do) solves this problem.
Generalized comparisons may be natural from a machine learning perspective, in particular in the context of active learning. A common type of queries used in practice it to give a score to an example (say 1-10), and not just label it as positive (+) or negative (-). Comparing the scores for different examples can be viewed as a “coarse” type of generalized comparisons.
If the set of original hyperplanes was “nice”, then generalized comparisons maintain some aspects of niceness in the queries performed. As an example that was already mentioned, if all hyperplanes in are -sparse then generalized comparisons are -sparse. This is part of a more general line of research, studying what types of “simple queries” are sufficient to obtain efficient active learning algorithms, or equivalently efficient linear decision trees for point-location problems.
1.1 Proof outline
Our proof consists of two parts. First, we focus on the case when is in general position, namely, every vectors in it are linearly independent. Then, we extend the construction to arbitrary .
The second part is fairly abstract and is derived via compactness arguments. The technical crux lies in the first part: let be in general position; we first construct a randomized generalized comparison decision tree for , and then derandomize it. The randomized tree is simple to describe: it proceeds by steps, where in each step about elements from are drawn, labelled, and sorted using generalized comparisons. Then, it is shown that the labels of some -fraction of the remaining elements in are inferred, on average. The inferred vectors are then removed from and this step is repeated until all labels in are inferred.
A central technical challenge lies in the analysis of a single step. It hinges on a result by Forster [For02] that transforms a general-positioned to an isotropic-positioned (see formal definition below) in a way that comparison queries on correspond to generalized comparison queries on . Then, since is in isotropic position, it follows that a significant fraction of has a large margin with respect to the input . This allows us to employ a variant of the margin-based inference analysis by [KLMZ17] on to derive the desired inference of some -fraction of the remaining labels in each step.
The derandomization of the above randomized LDT is achieved by a double-sampling argument due to [VC71]. A similar argument was used in [KLMZ17], however here several new technical challenges arise, as in each iteration in the above randomized algorithm, we only label a small fraction of the elements on average.
1.2 Related work
The point-location problem has been studied since the 1980s, starting from the pioneering work of Meyer auf der Heide [MadH84], Meiser [Mei93], Cardinal et al. [CIO15] and most recently Ezra and Sharir [ES17]. This last work, although not formally stated as such, solves the point-location problem for an arbitrary by a linear decision tree whose depth is . However, in order to do so, the linear queries used by the linear decision tree could be arbitrary, even when the original family is very simple (say -sparse). This is true for all previous works, as they are all based on various geometric partitioning ideas, which may require the use of quite generic hyperplanes. This should be compared with our results (Theorem 1.1). We obtain a linear decision tree of a bigger depth (by a factor of ), however the type of linear queries we use remain relatively simple; e.g., as discussed earlier, they are 1-dimensional and preserve sparseness.
1.3 Open problems
Our work addresses a problem raised in [KLM17], of whether “simple queries” can be sufficient to solve the point-location problem for general hyperplanes , without making any “niceness” assumptions on . The solution explored here is to allow for generalized comparisons, which are a -dimensional set of allowed queries. An intriguing question is whether this is necessary, or whether there are some -dimensional gadgets that would be sufficient.
In order to formally define the problem, we need the notion of gadgets. A -ary gadget in is a function . Let be a finite collection of gadgets in . Given a set of hyperplanes , a -LDT that solves is a LDT where any linear query is of the form for for some and . For example, a comparison decision tree corresponds to the gadgets (label queries) and (comparison queries). A generalized comparison decision tree corresponds to the -dimensional (infinite) family of gadgets . It was shown in [KLMZ17] that comparison decision trees are sufficient to efficiently solve the point-location problem in 2 dimensions, but not in 3 dimensions. So, the problem is already open in .
Open problem 1.
Fix . Is there a finite set of gadgets in , such that for every there exists a -LDT which computes , whose depth is logarithmic in ? Can one hope to get to the information theoretic lower bound, namely to ?
Another open problem is whether randomized LDT can always be derandomized, without losing too much in the depth. To recall, a randomized (zero-error) LDT is a distribution over (deterministic) LDTs which each computes . The measure of complexity for a randomized LDT is the expected number of queries performed, for the worst-case input . The derandomization technique we apply in this work (see Lemma 3.9 and its proof for details) loses a factor of , but it is not clear whether this loss is necessary.
Open problem 2.
Let . Assume that there exists a randomized LDT which computes , whose expected query complexity is at most for any input. Does there always exist a (deterministic) LDT which computes , whose depth is ?
2 Preliminaries and some basic technical lemmas
2.1 Linear decision trees
Let be a linear decision tree defined on input points . For a vertex of let denote the set of inputs whose computation path contains . Let denote the queries “” on the path from the root to that are replied by “”, and let denote the subspace . We say that is full dimensional if (i.e. no query on the path towards is replied by a ).
is convex (as an intersection of open halfspaces and hyperplanes).
and is open with respect to (that is, it is the intersection of an open set in with ).
We say that computes if for every leaf of , the restriction of the function to is constant. Thus, computes if and only if it computes for all . We say that computes almost everywhere if the restriction of to is constant, for every full dimensional leaf .
We will use the following corollary of Observations 2.1 and 2.2. It shows that if is not constant in then it must take all three possible values. In Section 3.4, we show that a linear decision tree that computes almost everywhere can be “exteneded” to a LDT that computes everywhere, without increasing the depth or introducing new queries. It relies on the following lemma.
Let be a vertex in , and assume that the restriction of to is not constant. Then there exist such that for every .
Let with . If then by continuity of there exists some on the interval between such that , and by convexity.
Else, without loss of generality, and . Therefore, since is open relative to :
for some small . This finishes the proof since . ∎
2.2 Inferring from comparisons
Let and let .
Definition 2.4 (Inference).
We say that infers at if is determined by the linear queries for . That is, if for any point in the set
it holds that . Define
The notion of inference has a natural geometric perspective. Consider the partition of induced by . Then, infers at if the cell in this partition that contains is either disjoint from or otherwise is contained in (so in either case, the value of is constant on the cell).
Our algorithms and analysis are based on inferences from comparisons. Let denote the set .
Definition 2.5 (Inference by comparisons).
We say that comparisons on infer at if infers at . Define
Thus, is determined by querying and for all . Naively, this requires some linear queries. However, using efficient sorting algorithm (e.g. merge-sort) achieves it with just comparison queries. A further improvement, when , is obtained by Fredman’s sorting algorithm that uses just comparison queries [Fre76].
2.3 Vectors in isotropic position
Vectors are said to be in general position if any of them are linearly independent. They are said to be in isotropic position if for any unit vectors ,
Equivalently, if is times the identity matrix. An important theorem of Forster [For02] (see also Barthe [Bar98] for a more general statement) states that any set of vectors in general position can be scaled to be in isotropic position.
Theorem 2.6 ([For02]).
Let be a finite set in general position. Then there exists an invertible linear transformation such that the set
is in isotropic position. We refer to such a as a Forster transformation for .
We will also need a relaxed notion of isotropic position. Given vectors and some , we say that the vectors are in -approximate isotropic position, if for all unit vectors it holds that
We note that this condition is easy to test algorithmically, as it is equivalent to the statement that the smallest eigenvalue of the positive semi-definite matrix is at least .
We summarize it in the following lemma, which follows from basic real linear algebra.
Let be unit vectors. Then the following are equivalent.
are in -approximate isotropic position.
where denotes the minimal eigenvalue of a positive semidefinite matrix .
We will need the following basic claims. The first claim shows that a set of unit vectors in an approximate isotropic position has many vectors with non-negligible inner product with any unit vector.
Let be unit vectors in a -approximate isotropic position, and let be a unit vector. Then, at least a -fraction of the ’s satisfy .
Assume otherwise. It follows that
This contradicts the assumption that the ’s are in -approximate isotropic position. ∎
The second claim shows that a random subset of a set of unit vectors in an approximate isotropic position is also in approximate isotropic position, with good probability.
Let be unit vectors in -approximate isotropic position. Let be independently and uniformly sampled. Then for any , the vectors are in -approximate isotropic position with probability at least
We will use two instantiations of creftypecap 2.9: (i) , and , and (ii) and . In both cases the bound simplifies to
3 Proof of main theorem
Let . We prove Theorem 1.1 in four steps:
3.1 A randomized LDT for in general position
In this section we construct a randomized generalized comparison LDT for in general position. Here, by a randomized LDT we mean a distribution over (deterministic) LDT which compute . The corresponding complexity measure is the expected number of queries it makes, for the worst-case input .
Let be a finite set in general position. Then there exists a randomized LDT that computes , which makes generalized comparison queries on expectation, for any input.
The proof of Lemma 3.1 is based on a variant of the margin-based analysis of the inference dimension with respect to comparison queries as in [KLMZ17] (The analysis in [KLMZ17] assumed that all vectors have large margin, where here we need to work under the weaker assumption that only a noticeable fraction of the vectors have large margin). The crux of the proof relies on scaling every by a carefully chosen scalar such that drawing a sufficiently large random subset of , and sorting the values using comparison queries (which correspond to generalized comparisons on the ’s) allows to infer, on average, at least of the labels of . The scalars are derived via Forster’s theorem (Theorem 2.6). More specifically, , where is a Forster transformation for .
In order to understand the intuition behind the main iteration (2) of the algorithm, define and for each let . Then , and so it suffices to infer the sign for many with respect to . The main benefit is that we may assume in the analysis that the set of vectors is in isotropic position; and reduce the analysis to that of using (standard) comparisons on and . These then translate to performing generalized comparison queries on and the original input . The following lemma captures the analysis of the main iteration of the algorithm. Below, we denote by .
Let , let be a finite set of unit vectors in -approximate isotropic position with , and let be a uniformly chosen subset of size . Then
Note that this proves a stronger statement than needed for Lemma 3.1. Indeed, it would suffice to consider only that is in (a complete) isotropic position. This stronger version will be used in the next section for derandomizing the above algorithm. Let us first argue how Lemma 3.1 follows from Lemma 3.2, and then proceed to prove Lemma 3.2.
By Lemma 3.2, in each iteration (2) of the algorithm, we infer on expectation at least fraction of the with respect to . By the discussion above, this is the same as inferring an fraction of the with respect to . So, the total expected number of iterations needed is . Next, we calculate the number of linear queries performed at each iteration. The number of label queries is and the number of comparison queries on (which translate to generalized comparison queries on ) is if we use merge-sort, and can be improved to by using Fredman’s sorting algorithm [Fre76]. So, in each iteration we perform queries, and the expected number of iterations is . So the expected total number of queries by the algorithm is . ∎
From now on, we focus on proving Lemma 3.2. To this end, we assume from now that is in -isotropic position for . Note that is inferred from comparisons on if and only if is, and that replacing an element of with its negation does not affect . Therefore, negating elements of does not change the expected number of elements inferred from comparisons on . Therefore, we may assume in the analysis that for all . Under this assumption, we will show that
It is convenient to analyze the following procedure for sampling :
Sample random points in , and uniformly at random.
We will analyze the probability that comparisons on infer at . Our proof relies on the following observation.
The probability, according to the above process, that is equal to the expected fraction of whose label is inferred. That is,
Thus, it suffices to show that . This is achieved by the next two propositions as follows. Proposition 3.4 shows that is in a -approximate isotropic position with probability at least , and Proposition 3.5 shows that whenever is in (1/2)-approximate isotropic position then with probability at least . Combining these two propositions together yields that and finishes the proof of Lemma 3.2.
Let be a set of unit vectors in -approximate isotropic position for . Let be a uniformly sampled subset of size . Then is in -approximate isotropic position with probability at least .
Let , be in (1/2)-approximate isotropic position, where . Let be sampled uniformly. Then
We may assume that is a unit vector, namely . Let and assume that with
Set . As is in (1/2)-approximate isotropic position, creftypecap 2.8 gives that for at least many . Set and define
where by out assumption . Note that in this case, we can compute from comparison queries on . We will show that
from which the proposition follows. This in turn follows by the following two claims, whose proof we present shortly.
Let . Assume that there exists a non-negative linear combination of such that
The assumption of creftypecap 3.6 holds for at least half the vectors in .
Proof of creftypecap 3.6.
Let and . As is in (1/2)-approximate isotropic position then is in -approximate isotropic position for . In particular, as we have . By applying comparison queries to we can sort . Then can be computed as the set of the elements with the largest inner product. creftypecap 2.8 applied to then implies that for all . Crucially, we can deduce this just from the comparison queries on , together with our initial assumption that is in (1/2)-approximate isotropic position. Thus we deduced from our queries that:
In addition, from our assumption it follows that . These together infer that . ∎
Let be unit vectors. For any , if then there exist and such that
In order to derive creftypecap 3.7 from creftypecap 3.8, we assume that . Then we can apply creftypecap 3.8 iteratively times with parameter , at each step identify the required , remove it from and continue. Next we prove creftypecap 3.8.
Proof of creftypecap 3.8.
Let denote the Euclidean ball of radius , and let denote the convex hull of . Observe that , as each is a unit vector. For define
We claim that having guarantees that there exist distinct for which
This follows by a packing argument: if not, then the sets for are mutually disjoint. Each has volume , and they are all contained in which has volume . As the number of distinct is we obtain that , which contradicts our assumption on .
Let be maximal such that . We may assume without loss of generality that , as otherwise we can swap the roles of and . Thus we have
Adding to both sides gives
which is equivalent to
The claim follows by setting and noting that by our construction , and hence the sum terminates at . ∎
3.2 A deterministic LDT for in general position
In this section, we derandomize the algorithm from the previous section. We still assume that is in general position, this assumption will be removed in the next sections.
Let be a finite set in general position. Then there exists an LDT that computes with generalized comparison queries.
Note that the this bound is worse by a factor of than the one in Lemma 3.1. In creftypecap 2 we ask whether this loss is necessary, or whether it can be avoided by a different derandomization technique.
Lemma 3.9 follows by derandomizing the algorithm from Lemma 3.1. Recall that Lemma 3.1 boils down to showing that for an fraction of on average. In other words, for every input vector , most of the subsets of size allow to infer from comparisons the labels of some -fraction of the points in . We derandomize this step by showing that there exists a universal set of size that allows to infer the labels of some -fraction of the points in , with respect to any . This is achieved by the next lemma.
Let be a set of unit vectors in isotropic position. Then there exists of size such that
We use a variant of the double-sampling argument due to [VC71] to show that a random of size satisfies the requirements. Let be a random (multi-)subset of size , and let denote the event
Our goal is showing that . To this end we introduce an auxiliary event . Let , and let be a subsample of , where each is drawn uniformly from and independently of the others. Define to be the event
The following claims conclude the proof of Lemma 3.10.
If then .
Proof of creftypecap 3.11.
Assume that . Define another auxiliary event as
To this end, fix such that both and hold. That is: is in -approximate isotropic position, and there exists such that . If we now sample , in order for to hold, we need that (i) , which holds with probability one, as ; and (ii) that . So, we analyze this event next.
Applying Lemma 3.2 to the subsample with respect to gives that
This then implies that
To conclude: we proved under the assumptions of the lemma that ; and that for every which satisfies it holds that . Together these give that . ∎
Proof of creftypecap 3.12.
We can model the choice of as first sampling of size , and then sampling of size . We will prove the following (stronger) statement: for any choice of ,
So from now on, fix and consider the random choice of . We want to show that:
We would like to prove this statement by applying a union bound over all . However, is an infinite set and therefore a naive union seems problematic. To this end we introduce a suitable equivalence relation that is based on the following observation.
is determined by for .
We thus define an equivalence relation on where if and only if for all . Let be a set of representatives for this relation. Thus, it suffices to show that
Since is finite, a union bound is now applicable. Sepcifically, it is enough to show that
Now, (a variant of) Sauer’s Lemma (see e.g. Lemma 2.1 in [KLM17]) implies that
Fix . If