Entropy, Triangulation, and Point Location
In Planar Subdivisions
Abstract
A data structure is presented for point location in connected planar subdivisions when the distribution of queries is known in advance. The data structure has an expected query time that is within a constant factor of optimal. More specifically, an algorithm is presented that preprocesses a connected planar subdivision of size and a query distribution to produce a point location data structure for . The expected number of pointline comparisons performed by this data structure, when the queries are distributed according to , is where is a lower bound on the expected number of pointline comparisons performed by any linear decision tree for point location in under the query distribution . The preprocessing algorithm runs in time and produces a data structure of size . These results are obtained by creating a Steiner triangulation of that has nearminimum entropy.
ENTROPY, TRIANGULATION, AND POINT LOCATION
IN PLANAR SUBDIVISIONS
Sébastien Collette^{†}^{†}thanks: Chargé de Recherches du F.R.S.FNRS. 
Université Libre de Bruxelles and Vida Dujmović^{†}^{†}thanks: Research partially supported by NSERC and the Ontario Ministry of Research and Innovation. 
Carleton University and John Iacono^{†}^{†}thanks: Research partially supported by NSF grants OISE0334653 and CCF0430849 and by an Alfred P. Sloan research fellowship. 
Polytechnic Institute of NYU and Stefan Langerman^{†}^{†}thanks: Maître de recherches du F.R.S.FNRS. 
Université Libre de Bruxelles and Pat Morin^{†}^{†}thanks: This research began while the author was a visiting researcher at the Université Libre de Bruxelles and was completed while the author was a visiting researcher at National ICT Australia and the University of Sydney. Research partially supported by NSERC, FNRS, the Ontario Ministry of Research and Innovation, Canada Foundation for Innovation, The University of Sydney, and National ICT Australia. 
Carleton University 
1 Introduction
The planar point location problem is the classic search problem in computational geometry. Given a planar subdivision ,^{1}^{1}1A planar subdivision is a partitioning of the plane into points (called vertices), open line segments (call edges), and maximal connected 2dimensional regions (called faces). the planar point location problem asks us to construct a data structure so that, for any query point , we can quickly determine which face of contains .^{2}^{2}2In the degenerate case where is a vertex or contained in an edge of any face incident on that vertex/edge may be returned as an answer.
The history of the planar point location problem parallels, in many ways, the history of binary search trees. After a few initial attempts [11, 18, 21], asymptotically optimal (and quite different) linearspace query time solutions to the planar point location problem were obtained by Kirkpatrick [17], Sarnak and Tarjan [24], and Edelsbrunner et al. [12] in the mid 1980s. These results were based on hierarchical simplification, data structural persistence, and fractional cascading, respectively. All three of these techniques have subsequently found many other applications. An elegant randomized solution, combining aspects of all three previous solutions, was later given by Mulmuley [20], and uses randomized incremental construction, a technique that has since become pervasive in computational geometry [10, Section 9.5]. Preparata [22] gives a comprehensive survey of the results of this era.
In the 1990s, several authors became interested in determining the exact constants achievable in the query time. Goodrich et al. [13] gave a linearsize data structure that, for any query, requires at most pointline comparisons and conjectured that this query time was optimal for linearspace data structures.^{3}^{3}3Here and throughout, logarithms are implicitly base 2 unless otherwise specified. The following year, Adamy and Seidel [1] gave a linearspace data structure that answers queries using pointline comparisons and showed that this result is optimal up to the third term.
Still not done with the problem, several authors considered the point location problem under various assumptions about the query distribution. All these solutions compare the expected query time to the entropy bound; in a planar subdivision with faces , if is the probability that is contained in , then no algorithm that makes only binary decisions can answer queries using an expected number of decisions that is fewer than
(1) 
In the previous results on planar point location, none of the query times are affected significantly by the structure of ; they hold for arbitrary planar subdivisions. However, when studying point location under a distribution assumption the problem becomes more complicated and the results become more specific. A connected subdivision is a planar subdivision whose underlying (vertex and edge) graph is connected. A convex subdivision is a planar subdivision whose faces are all convex polygons, except the outer face, which is the complement of a convex polygon. A triangulation is a convex subdivision in which each face has at most 3 edges on its boundary.
Note that, if every face of has a constant number of sides, then can be augmented, by the addition of extra edges, so that it is a triangulation without increasing (1) by more than a constant. Similarly, in a convex subdivision where the query distribution is uniform within each face of , the faces of the subdivision can be triangulated without increasing the entropy by more than a constant [3]. Thus, in the following we will simply refer to results about triangulations where it is understood that these also imply the same result for planar subdivisions with faces of constant size or convex subdivisions when the query distribution is uniform within each face.
Arya et al. [2] gave two results for the case where the query point is chosen from a known distribution where the and coordinates of are chosen independently and is a convex subdivision. They gave an space data structure for which the expected number of pointline comparisons is at most and an space data structure for which the expected number of pointline comparisons is at most . The assumption about the independence of the and coordinates of is crucial to the these results.
For arbitrary distributions that are known in advance, several results exist. Iacono [15, 16] showed that, for triangulations, a simple variant of Kirkpatrick’s original point location structure gives a linear space, expected query time data structure. Simultaneously, and independently, Arya et al. [5] showed that a variant of Mulmuley’s randomized data structure also achieves expected query time. A sequence of papers by Arya et al. [3, 4, 6] has recently culminated in an space data structure for point location in triangulations with query time [6].
In the current paper, we show that, for any connected planar subdivision, there exists a data structure of size that can answer point location queries using point/line comparisons. Here, is a lower bound on the expected cost of any linear decision tree that solves this problem. Note that is often greater than the quantity defined above and this is necessarily so. To see this, consider that the problem of testing whether a query point is contained in a simple polygon with vertices is a special case of planar point location in a connected planar subdivision. However, in this special case the subdivision only has 2 faces, so . It seems unlikely that, for any simple polygon and any probability measure over , it is always possible to test in expected time if a point drawn from is contained in . Indeed, it is not hard to design a convex polygon and distribution so that the expected cost of any algebraic decision tree for point location in , under query distribution , is .
Note that all known algorithms for planar point location that do not place special restrictions on the input subdivision can be described in the linear decision tree model of computation.^{4}^{4}4Although significant breakthroughs have recently been made in this area [8, 23], we deliberately do not survey algorithms that require the vertices of the subdivision to be on integer coordinates. The data structures presented in the current paper are the most general results known about planar point location and imply, to within a lower order term, all of the results discussed in the introduction.
We achieve our results by showing how to compute a Steiner triangulation of that has nearly minimum entropy over all possible triangulations of and then proving that the entropy of a minimumentropy Steiner triangulation of is a lower bound on the cost of any linear decision tree for point location in . By then applying the recent result of Arya et al. to the Steiner triangulation we obtain upper and lower bounds that match to within a lowerorder term.
A preliminary version of this paper, which dealt only with convex subdivisions, has appeared in the Proceedings of the 19th ACMSIAM Symposium on Discrete Algorithms (SODA 2008) [9].
The remainder of this paper is organized as follows: Section 2 presents definitions and notations used throughout the paper. Section 3 shows how to compute a nearminimumentropy triangulation of a simple polygon. Finally, Section 4 applies these tools to obtain our point location structure for connected planar subdivisions.
2 Preliminaries
In this section we give definitions, notation, and background required in subsequent sections.
Interiors and Boundaries.
For a set , we denote the boundary of by and the interior of by . The closure of is denoted by .
Triangles and Convex Polygons.
For the purposes of this paper, a triangle is the common intersection of at most 3 closed halfplanes. This includes triangles with infinite area and triangles having 0, 1, 2, or 3, vertices. Similarly, a convex gon is the common intersection of at most closed halfplanes.
For a closed region , a triangulation of is a set of triangles whose interiors are pairwise disjoint and whose union is . We use the convention that, unless is explicitly mentioned, the triangulation in question is a triangulation of . This definition of a triangulation is often referred to as a Steiner triangulation since it allows vertices of the triangles to be anywhere in , and not at some finite predefined set of locations.
Simple Polygons, Pseudotriangles, and Geodesic Triangles.
A (nearsimple) polygon is a closed subset of whose boundary is piecewise linear and such that is homeomorphic to an open disk. Note that this definition of a polygon implies that every bounded face of a connected planar subdivision is a polygon. Also, triangles, as defined above, are polygons. Note that nearsimple polygons are slightly more general than simple polygons, for which is a simple closed curve. However, our definition is sufficiently close that algorithms designed for simple polygons continue to work with nearsimple polygons.
A reflex chain in a polygon is a consecutive sequence of vertices of , where the internal angle at is at least , for all . A pseudotriangle is a polygon whose boundary consists of 3 reflex chains. An convex pseudotriangle () is a pseudotriangle in which of the reflex chains consist of single line segments.
A shortest path between points , denoted is a curve of minimum length that is contained in and that has and as endpoints. For 3 points, , a geodesic triangle in , denoted is the union of all shortest paths of the form , where . Geodesic triangles are closely related to pseudotriangles. In particular, every geodesic triangle consists of a pseudotriangle and three paths joining the three convex vertices of to , , and .
Classification Problems and Classification Trees.
A classification problem over a domain is a function . A ary classification tree is a full ary tree^{5}^{5}5A full ary tree is a rooted ordered tree in which each nonleaf node has exactly children. in which each internal node is labelled with a function and for which each leaf is labelled with a value . The search path of an input in a classification tree starts at the root of and, at each internal node , evaluates and proceeds to the th child of . We denote by the label of the final (leaf) node in the search path for . We say that the classification tree solves the classification problem over the domain if, for every , .
In this paper, we are especially concerned with linear decision trees. These are binary classification trees for a problem over the domain . Each internal node of a linear decision tree contains a linear inequality , and the node evaluates to 1 or 0 depending on whether the query point satisfies the inequality or not, respectively. Geometrically, each internal node of is labelled with a directed line and the decision to go to the left or right child depends on whether is to the left or right (or on) this line. An immediate consequence of this is that, for each leaf of , the closure of is a convex polygon.
Probability.
Throughout this paper is a probability measure over that represents the query distribution. The notation denotes the probability of event under the probability measure . The notation denotes the conditional probability of given , i.e., . For any set , we use the shorthand to denote .
For a set of subsets of , we define the induced entropy of , denoted by as . For two sets with , the joint entropy of and , is . It is wellknown that (see, for example, Gray [14, Lemma 2.3.2]).
We will sometimes abuse terminology slightly by referring to a triangulation of as a partition of into triangles, although strictly speaking this is not true since the triangles in are closed sets that overlap at their boundaries. We will then continue the abuse by computing the induced entropy of . This introduces a technical difficulty in that and inequality is possible if there exists sets such that the area of is 0 and . To avoid this technical difficulty, we will assume that is nice in the sense that, if the area of is 0 then . This implies that, for every in . This assumption will avoid lengthy technical but uninteresting cases in our analysis. In practice, this problem can be avoided by using a symbolic perturbation of the query point.
The probability measures used in this paper are usually defined over . We make no assumptions about how these measures are represented, but we assume that an algorithm can, in constant time, perform each of the following two operations:

given a triangle , compute , and

given a triangle and a point at the intersection of two of ’s supporting lines, compute a line that contains and that partitions into two open triangles and such that .
Requirement 2 is used only for convenience in describing our data structure. It is not strictly necessary, but its use greatly simplifies the exposition of our results. To eliminate requirement 2, one can use the same method described by Collette et al. [9, Section 5].
For a classification tree that solves a problem and a probability measure over , the expected search time of is the expected length of the search path for when is drawn at random from according to . Note that, for each leaf of there is a maximal subset such that the search path for any ends at . Thus, the expected search time of (under distribution ) can be written as
where denotes the leaves of and denotes the length of the path from the root of to . For any tree we use to denote the vertices of .
The following theorem, which is a restatement of (half of) Shannon’s Fundamental Theorem for a Noiseless Channel [25, Theorem 9], is what all previous results on distributionsensitive planar point location use to establish their optimality:
Theorem 1 (Fundamental Theorem for a Noiseless Channel).
Let be a classification problem and let be selected from a distribution such that , for . Then, any ary classification tree that solves has
(2) 
3 Minimum Entropy Triangulations
Let be a simple polygon with vertices, denoted as they occur, in counterclockwise order, on the boundary of . We will show how to find a triangulation of that has nearminimum entropy. That is, we will find a triangulation such that is nearminimum over all triangulations of . In order to shorten the formulas in this section, we will implicitly condition the distribution on . More precisely, throughout this section the notation should be treated as shorthand for .
3.1 The Triangulation
Our triangulation algorithm is recursive and takes as input a polygon and a reflex chain on the boundary of . If is a triangle, then there is nothing to do, so the algorithm outputs and terminates. Otherwise, the algorithm first selects a point on the boundary of and adds all the edges of the geodesic triangle to the triangulation . Observe that removing from disconnects into components where is a polygon that shares a reflex chain with the pseudotriangle (see Figure 1). The point is selected in such a way that, for all , .^{6}^{6}6The existence of such a point is readily established by a standard continuity argument; see Bose et al. [7] for an example. Each of the subpolygons can then be triangulated recursively by applying the algorithm to and the reflex chain .
To complete the triangulation all that remains is to partition into triangles. To do this, we first partition into at most one triangle and three 2convex pseudotriangles as shown in Figure 2.a. Let be the connected component of that contains . To complete the triangulation we will partition into triangles, for each , using a recursive algorithm. This algorithm selects an edge of the reflex chain in and extends in both directions until it reaches the boundary of (see Figure 2.b). The resulting line segment partitions into a triangle , and two 2convex pseudotriangles and that are triangulated recursively. At the same time, is partitioned into up to 4 pieces (see Figure 2.c):

the triangle , and

a subpolygon incident to ,

The two connected components and of that contain and , respectively.
The edge is selected so that for each .^{7}^{7}7The existence of such an edge is assured by yet another continuity argument. This completes the description of the triangulation . A partially completed triangulation is show in Figure 3.
(a)  (b)  (c) 
3.2 The Tree
In order to study the entropy of the triangulation defined above, we will impose a tree structure on the pieces of induced by the triangles in . The tree for is a tree whose nodes are subpolygons of and which has the property that, for any node that is the child of a node , .
The tree has three different kinds of nodes, called Pnodes, Tnodes, and Qnodes. The root of is the polygon and is a Pnode. The root of has the following children (defined in terms of the construction algorithm in the previous section; see Figure 4):

Each subpolygon whose boundary does not share a segment with is a child of and is a Pnode.

The subpolygon is a child of and is called a Tnode.
The subpolygon has three children that are called Qnodes. The subtree rooted at is a ternary tree corresponding to the recursive partitioning of and done by the algorithm. The leaves of this subtree are Pnodes and the internal nodes of this subtree are Qnodes. Each internal node has up to 3 children, up to 1 of which may be a Pnode corresponding to a subpolygon and up to two of which may be Qnodes.
Note that the above definition yields a tree whose leaves are Pnodes that correspond to the subpolygons obtained by removing from . The subtree rooted at each such leaf is obtained recursively from the recursive triangulation of .
Now that we have defined the tree , we study some of its properties. Our first lemma says that does a good job of splitting based on its probabilities.
Lemma 1.
Let be a polygon, let be a probability measure over , and let be the tree for . Let be a node of whose depth is . Then .
Proof.
Assume , otherwise there is nothing to prove. Let be the fourth node on the path from the root, , of to . Let be the path in from to . If has at least two Pnodes, then . Otherwise, the second node in is a Tnode followed by 2 Qnodes. By construction, for any Qnode whose parent is a Qnode , . Therefore, . If , then and the proof is complete. Otherwise, apply the same argument inductively on the path from to , to obtain
and this completes the proof. ∎
Our next lemma says that a single line segment does not intersect very many high probability triangles in .
Lemma 2.
Let be a polygon, let be a probability measure over , and let be the tree for , let be a line segment, and let be the set of all vertices that are distance at most from the root of and such that intersect . Then
Proof.
There are 3 types of nodes in whose interiors intersect : Type 1 nodes contain one endpoint of in their interior, Type 2 nodes contain both endpoints of in their interior, and Type 0 nodes contain no endpoints of in their interior. Notice that each level of contains at most 1 Type 2 node or 2 Type 1 nodes, so the total number of Type 1 and Type 2 nodes at distance at most from the root is at most . Thus, all that remains is to bound the number of Type 0 nodes whose distance from the root is at most .
Let be any Pnode such that does not contain either endpoint of . Since is a Pnode, there is a reflex chain on the boundary of that is a shortest path between two points on the boundary of , and every path in from to intersects . Stated another way, the interior of a Pnode does not intersect unless it contains at least one endpoint of . Therefore, all Type 0 nodes are either Tnodes or Qnodes.
For every Type 0 node , there is a path in from to a Tnode that is adjacent to a Type 1 or Type 2 node. Furthermore, the path consists of followed by 0 or more Qnodes, and terminates with a Tnode. Looking more closely at the definition of Qnodes, we see that two sibling Qnodes and are not mutually visible, i.e., there is no line segment that intersects both and .
All of this implies that each of the at most Type 1 or Type 2 nodes is adjacent to at most 1 Type 0 Tnode, and this Tnode is the endpoint of at most 3 paths of Type 0 Qnodes. Each such path is of length at most . Therefore, the total number of nodes in that intersect is at most . ∎
3.3 MinimumEntropy Triangulation
Next, we show that the triangulation defined above is nearlyminimum entropy over all possible triangulations of . We do this by developing a technique for lowerbounding the entropy of one triangulation in terms of the entropy of another triangulation. We then show how to apply this technique to lower bound the entropy of any triangulation in terms of the entropy of .
To obtain lower bounds on the entropy of a triangulation , consider the following easily proven observation: If each triangle in intersects at most triangles of some triangulation then .^{8}^{8}8Proof: Consider the set . Each triangle of contributes at most pieces to , so we have . This observation allows us to use to prove a lower bound on the entropy of a triangulation . Unfortunately, the condition that each triangle of intersect at most triangles of is too restrictive for our purposes. Instead, we require following stronger result:
Lemma 3.
Let be a probability measure over . Let and be triangulations, and let be a partition of the triangles in . Suppose that, for all and for each triangle , intersects at most triangles in . Then
Intuitively, Lemma 3 can be thought of as follows: If we tell an observer which of the a point drawn according to occurs in then the amount of information we are giving the observer about the experiment is at most . However, after giving away this information, we are able to apply the simple observation in the previous paragraph, since each triangle in intersects at most elements of each . Thus, Lemma 3 is really just applications of the simple observation. The following proof formalizes this:
Proof.
and this completes the proof. ∎
The remainder of our argument involves partitioning the triangles of into subsets and then showing that and are not too big. To help us, we will use the tree . For a node in with children , let be the portion of not covered by ’s children. Note that is always either the empty set or is a triangle in (see Figure 4). In fact, for every triangle , there is exactly one such that , and for every such that is nonempty there is exactly one such that . This implies that^{9}^{9}9Here, and throughout the remainder, we slightly abuse notation by using the convention that .
For a node , we define is the probability that a point drawn from is contained in .
Next we apply Lemma 3 to obtain a lower bound on the entropy of any triangulation .
Lemma 4.
Let be a simple polygon, let be a probability measure over , and consider the triangulation . Then, for any triangulation of ,
Proof.
Let be the tree for . Partition the nodes of into groups where
In the following we will fix a value , , to be defined later. A group is large if it contains at least elements, otherwise is small. Let denote the index set of the large groups, i.e., . Let be the index set of the small groups.
Note that, for any group , Lemma 1 ensures that all elements of have depth at most in . Therefore, Lemma 2 ensures that any triangle of intersects at most triangles of . Therefore, applying Lemma 3 with , we obtain:
(3) 
Thus, all that remains is to bound the contribution of the last two terms on the right hand side of (3). First,
where the last equality follows from Jensen’s Inequality. Finally, we show that the contribution of is at most .
where the last equality is obtained using the Taylor series expansion for to obtain the inequality for close to 0. Continuing, we get
Where the last inequality is obtained by setting . Thus, we have shown that
(4) 
which implies that . Applying this to the right hand side of (4) yields , completing the proof. ∎
Lemma 4 shows that the triangulation defined previously is nearly minimumentropy over all triangulations of . The following theorem gives an algorithmic version of Lemma 4.
Theorem 2.
Let be a simple polygon with vertices, and let be a probability measure over . Then there exists an time algorithm that computes a triangulation of having triangles and such that, for any triangulation of ,
Proof.
We show how the construction of the triangulation described in Section 3.1 can be modified to run in time. When constructing the first step is to find the third vertex of the geodesic triangle . This can be accomplished in time by computing the shortest path trees from and to all other vertices of and using these to find . For an example of a similar computation, see Bose et al. [7, Section 2.2].
Next, is split into three 2convex pseudotriangles , which is easily accomplished in time. The last step, before recursing, is to triangulate each of . This step can be accomplished in time using a 2sided exponential searching trick that was used by Mehlhorn [19] in the construction of biased binary search trees (see also, Collette et al. [9, Theorem 1]).
Finally, the algorithm recurses on each of the pieces . In this way, we obtain a divideandconquer algorithm for constructing . Unfortunately, this algorithm may have running time since there is no bound significantly smaller than on the size of an individual subproblem . To overcome this, before recursing on a subproblem we check if it contains more than vertices. If so, then rather than recursing normally on we choose a geodesic triangle , one of whose sides is the reflex chain and such that removing from leaves a set of subpolygons each with at most vertices. This modification then yields an algorithm whose recursion tree has depth and at which the work done at each level is , so the total running time of this algorithm is .
Note that this algorithm yields a triangulation that is different from . In particular, there may exist one with . Despite this, all the proofs of Lemmas 1–4 continue to hold almost without modification. The only difference occurs in Lemma 1, which now only guarantees a bound of on the number of black edges, but this has almost no effect on subsequent computations.
Finally, to see that contains triangles, we count the different types of edges used in the triangulation . Some of these edges are edges of , of which there are at most . Some of these edges are edges of geodesic triangles, which always connect two vertices of and do not cross each other, so there are at most of these. The remaining edges are used to triangulate the interiors of pseudotriangles. A pseudotriangle that has vertices is triangulated using edges. Since the total number of vertices in all pseudotriangles is at most , this means that there are at most edges used to triangulate pseudotriangles. Therefore, the total number of edges used by triangles in , and hence the number of triangles in , is . ∎
4 Point Location in Simple Planar Subdivisions
Next we consider the problem of point location in simple subdivisions. The following theorem of Arya et al. [6] shows that a low entropy triangulation can be used to make a good point location structure.
Theorem 3 (Arya et al. 2007).
Let be a probability measure over and let be a triangulation of having a total of triangles. Then there exists a data structure of size that can be constructed in time, and for which the expected number of point/line comparisons required to locate the face of containing a query point , drawn according to , is .
The following lemma shows that the entropy of a minimumentropy triangulation gives a lower bound on the cost of any point location structure.
Lemma 5.
Let be any linear decision tree for a classification problem over . Then there exists a linear decision tree for , such that, for each leaf of , is a triangle and satisifies
for any probability measure over .
Proof.
Each leaf of has a region that is a convex polygon. If has sides then the depth of in is at least . To obtain the tree replace each such leaf of by a balanced binary tree of depth by repeatedly splitting the leaf into two children and whose regions have and vertices. For a leaf , let denote the set of leaves in in the subtree of . Then
where the last inequality is an application of Jensen’s Inequality. ∎
Lemma 5 says that for any linear decision tree for point location, there is an underlying triangulation. The entropy of this triangulation gives a lower bound on the cost of the decision tree. Thus, the entropy of a minimum entropy triangulation gives a lower bound on the expected cost of any linear decision tree for point location.
Keeping the above in mind, our point location structure is simple. Let be a connected planar subdivision whose faces are and let be a probability measure over . We assume, without loss of generality that the outer face of is the complement of a triangle, since otherwise we can add at most 3 vertices and 4 edges to to make this true. Adding these edges will not increase the entropy the minimum weight triangulation of by more than a constant. With this assumption, testing if the query point is in the outer face of can be done using 3 linear comparisons after which we may safely assume that the query point is contained in an internal face of .
We triangulate each internal face of (a nearsimple polygon) using Theorem 2 to obtain a triangulation . The union of all is a triangulation of , to which we apply Theorem 3 to obtain a point location structure for point location in and hence also in . The following theorem shows that is nearly optimal:
Theorem 4.
Given a connected planar subdivision with vertices and a probability measure over , a data structure of size can be constructed in time that answers point location queries in . The expected number of point/line comparisons performed by , for a point drawn according to is
where is any linear classification tree that answers point location queries in .
Proof.
The space and preprocessing requirements follow from Theorem 2 and Theorem 3. To prove the bound on the expected query time, apply Lemma 5 to the tree and consider the resulting tree , each of whose leaves have regions that are triangles and such that
(5) 
Observe that each leaf of corresponds to a triangle in that is completely contained in one of the faces of . Let denote this set of triangles and let denote the subset of contained in . Consider the entropy of the distribution induced by the leaves of :
(6) 
Similarly, the entropy of is given by