Peeling potatoes near-optimally in near-linear time††thanks: A preliminary version of this paper appeared in Proc. 30th Annual Symposium on Computational Geometry (SoCG 2014), pp. 224–231.
We consider the following geometric optimization problem: find a convex polygon of maximum area contained in a given simple polygon with vertices. We give a randomized near-linear-time -approximation algorithm for this problem: in time we find a convex polygon contained in that, with probability at least , has area at least times the area of an optimal solution. We also obtain similar results for the variant of computing a convex polygon inside with maximum perimeter.
To achieve these results we provide new results in geometric probability. The first result is a bound relating the probability that two points chosen uniformly at random inside are mutually visible and the area of the largest convex body inside . The second result is a bound on the expected value of the difference between the perimeter of any planar convex body and the perimeter of the convex hull of a uniform random sample inside .
Keywords: geometric optimization; potato peeling; visibility graph; geometric probability; approximation algorithm.
We consider the algorithmic problem of finding a maximum-area convex set in a given simple polygon. Thus, we are interested in computing
The problem was introduced by Goodman , who named it the potato peeling problem. Goodman also showed that the supremum is actually achieved, so we can replace it by the maximum. Henceforth we use to denote the number of vertices in the input polygon .
Chang and Yap  showed that can be computed in time. Since there have been no improvements in the running time of exact algorithms, it is natural to turn the attention to faster, approximation algorithms. A step in this direction is made by Hall-Holt et al. , who show how to obtain a constant-factor approximation in time.
In this paper we present a randomized -approximation algorithm. Besides the simple polygon , the algorithm takes as input a parameter controlling the approximation. In time the algorithm returns a convex polygon contained in that, with probability at least , has area at least . For any constant , and more generally for any , the running time becomes . As usual, the probability of error can be reduced to using independent repetitions of the algorithm. Note that for , the exact algorithm of Chang and Yap  is faster as it runs in time .
Overview of the approach.
Let be a set of points contained in . The visibility graph of , denoted by , has as vertex set and, for any two points and in , the edge is in whenever the segment is contained in . See Figure 1.
Let us assume that the set of points is obtained by uniform sampling in . We note the following properties:
For each convex polygon , the area of the convex hull is similar to the area of , provided that is large enough. For this, it is convenient to have large .
For each convex polygon , the boundary of is made of edges in .
With dynamic programming one can find a maximum-area convex polygon defined by edges of . For this to be efficient, it is convenient that has few edges.
Thus, we have a trade-off on the number of points in that are needed. We argue that there is a suitable size for such that has a near-linear expected number of edges and, with reasonable probability, the edges of give a good inner approximation to an optimal solution. Instead of finding the optimal solution directly in , we make a search in a small parallelogram of area around each edge of , performing a second sampling. The core of the argument is a bound relating and the probability that two random points in are visible. Such relation was unknown and we believe that it is of independent interest. See Theorems 9, 10 and the follow up work  (summarized in Theorem 11 here) for the precise relations.
We are also interested in finding a convex polygon inside with maximum perimeter. Let denote the perimeter of a convex body . In the case that is a segment, then is twice the length of . Let
By the same compactness argument as used by Goodman [25, Proposition 1], using the Blaschke selection theorem, the supremum is achieved and so it can be replaced by the maximum.
We provide a randomized algorithm to compute a convex polygon (or segment) inside whose perimeter is at least . For every , to succeed with probability , the algorithm uses time
The main obstacle in this case is that the polygons with near-optimal perimeter may be very skinny and thus have arbitrarily small area. For that case, random sampling of points is futile, but we can use a longest segment contained in to approximate . More precisely, if the perimeter-optimal convex polygon has aspect ratio , then we can -approximate it via a longest segment inside , which in turn can be -approximated in near-linear time . If the perimeter-optimal polygon has aspect ratio , then it has area at least , and the approach based on random samples of points can be adapted, with a larger number of sample points. To bound the number of sample points we use a new theorem in geometric probability bounding the expected difference between the perimeter of any planar convex body and the perimeter of the convex hull of a random sample inside . See our Theorem 18 for the precise statement.
Other related work.
There have been several results about finding maximum-area objects of certain type inside a given simple polygon. DePano, Ke and O’Rourke  consider squares and equilateral triangles, Daniels, Milenkovic and Roth  consider axis-parallel rectangles, Melissaratos and Souvaine  consider arbitrary triangles. Subquadratic algorithms to find a longest segment contained in a simple polygon were first given by Chazelle and Sharir  and improved by Agarwal, Sharir and Toledo [1, 2]. Hall-Holt et al.  present near-linear time algorithms for a -approximation of the longest segment.
Aronov et al.  consider a variation where the search is restricted to convex polygons whose edges are edges of a given triangulation (with inner points) of . They show how to compute a maximum-area convex polygon for this model in time, where is the number of edges in the triangulation.
Dumitrescu, Har-Peled and Tóth  consider the following problem: given a unit square and a set of points inside , find a maximum-area convex body inside that does not have any point of in its interior. This is an instance of the potato peeling problem for polygons with holes. They provide a -approximation in time . For any fixed , the running time is quadratic. Our algorithm exploits the absence of holes in , so it does not produce an improvement in this case.
The potato peeling problem can be understood as finding a largest set of points that are mutually visible. Rote  showed how to compute in polynomial time the probability that two random points inside a polygon are visible. A faster algorithm has been proposed by Buchin et al. . Cheong, Efrat and Har-Peled  consider the problem of finding a point in a simple polygon whose visibility region is maximized. They provide a -approximation algorithm using near-quadratic time. The approach is based on taking a random sample of points in the polygon, constructing the visibility region of each point, and taking a point lying in most visibility regions.
We will have to generate points uniformly at random inside a triangle. For this, we will assume that a random number in the interval can be generated in constant time.
2 About convexity
Here we provide tools related to convexity.
2.1 Inner approximation using random sampling
In this subsection, we provide results about the number of points that have to be sampled inside a convex body so that the area of the convex hull of the sample is a good approximation to the area of . We may think of as a maximum-area convex set in for which we aim to find a -approximation. In our algorithm, we sample points in a superset of , thus we also provide extensions to this case. In particular, Lemma 1 deals with the problem of sampling points inside a given convex body . In Lemma 3 the sample is taken from a larger polygon and the goal is to hit with at least points. These two results are then combined together in Lemma 4.
Let be a convex body in the plane and let be a sample of points chosen uniformly at random inside . There is some universal constant such that, if , then with probability at least it holds that .
Let us scale so that it has area . We have to show that holds with probability at most .
Let denote the convex hull of points chosen uniformly at random in and define . Thus is the missed area, that is, the area of . Groemer  showed that is maximized when is a disk of area . Rényi and Sulanke  showed that for every smooth convex set there exists some constant , depending on , such that . This result also follows from a similar upper bound by Rényi and Sulanke  on the expected number of edges of and from Efron’s  identity . Both statements together imply that
where is the constant when is a unit-area disk. (From the results of , or subsequent works, one can explicitly compute that , so the constant is very reasonable.)
We set . Whenever , we can use Markov’s inequality to obtain
For convenience we will assume that for all . This is not problematic because we can replace with , if needed.
Let be a convex body contained in a polygon , let be a random sample of points inside , and let be an arbitrary value. If
then with probability at least it holds that .
Let . The random variable is a sum of independent Bernoulli random variables, each with expected value
Standard calculations (or formulas) show that
We can now use Chebyshev’s inequality in its form
and the inequality to obtain the following:
Let be a convex body contained in a polygon , let be a random sample of points inside , and let be the constant in Lemma 1. If
then with probability at least it holds that .
2.2 Outer containment in a parallelogram
In the previous subsection we have proved that, given a superset of , for samples of points in of a certain size, is a good approximation to the area of with positive constant probability. If we set , the size of might turn out too big to yield a subquadratic algorithm. For this reason, we want to find a smaller superset of to take the sample from. In this subsection we show a method to find a parallelogram containing with area proportional to the area of , and that this parallelogram can be found with positive constant probability, using a relatively small random sample from .
Let be a convex body in . We use to denote the -coordinate of a point . For each we define as the unique value satisfying
Thus, the horizontal line at height breaks into two parts and the lower one has a proportion of the area of . We further define
For each convex body in
In this proof, let us drop the dependency on in the notation and set for each . We only show that ; the other inequality is symmetric.
For , let be the horizontal line with -coordinate . Let be a highest point of , let be the intersection of with , let be the line through and , let be the line through and , let be the intersection of with , and let be the intersection of with . See Figure 2.
By the convexity of , the triangle is contained in the portion of between and , and the portion of between and is contained in the trapezoid . Thus
The triangle is similar to the triangle with scale factor . By (1), the scale factor is at least , that is
For any two points and and any value , let , , and let denote the parallelogram whose vertices are the four horizontal translates of the points and by distance . See Figure 3, left. Note that and .
Let be a convex body and assume that . Let and be points in such that
Then is contained in .
In this proof, let us drop the dependency on in the notation and set for each .
By Lemma 5 we have
Therefore is contained between the horizontal lines and . These are the lines supporting the top and bottom side of .
Assume, for the sake of a contradiction, that has some point outside . Since lies between the lines and , it must be that the horizontal distance from to is more than . See Figure 3, right. Since the triangle is contained in we would have
which is a contradiction. Therefore any point of is contained in . ∎
Let be a convex body contained in a polygon , and assume that . If is a random sample of points inside with
then with probability at least it holds that contains two points and such that is an edge of and contains .
and consider the following events:
Lemma 3 implies
Applying the Fréchet inequality
which does not require any independence assumption, we obtain that
When and hold, there are points and and Lemma 6 implies that is contained in . Moreover, is an edge of because is a convex body contained in . ∎
2.3 Largest convex polygon in a visibility graph.
In this subsection we give an algorithm to find a largest convex polygon whose edges are defined by a visibility graph inside a polygon. In our algorithm LargePotato, described in Section 4, the vertices of the visibility graph are points of a random sample in , and the algorithm in the current subsection is used to find the largest convex polygon defined by that sample.
Let be a visibility graph in some simple polygon. We denote the set of vertices and edges of by and , respectively. We assume that the coordinates of the vertices of are known. A set of vertices from is a convex clique if: (i) there is an edge between any two vertices of , and (ii) the points of are in convex position. The area of a convex clique is the area of .
Let be a point of . We are interested in finding a convex clique of maximum area in , denoted by , that has as highest point. Thus we want
For any point of , we can compute in time .
Pruning vertices, we can assume that all vertices of are adjacent to and below . We can then use the algorithm of Bautista-Santiago et al. , which is an improvement over the algorithm of Fischer , restricted to the edges that are in . For completeness, we provide a quick overview of the approach.
For this proof, let us denote . We sort the points of counterclockwise radially from . Let be the labeling of the points of according to that ordering. Thus, for each the sequence is a right turn.
Using a standard point-line duality and constructing the arrangement of lines dual to the points , we get the circular order of the edges around each point . For this we spend in total time [15, 22].
For each such that , let be the largest-area convex clique that has , , and consecutively along the boundary of . We then have
Taking the convention that , the values satisfy the following recursion
To argue the correctness of the recursion, one needs to observe that the right side of the equation does indeed correspond to the construction of a convex polygon.
For any fixed , the values , , can be computed in time, provided that the edges incident to are already radially sorted and the values are already available for all . To achieve linear time, one performs a scan of the edges incident to and uses the property that
forms a contiguous sequence in the circular ordering of edges incident to .
Thus, we can fill in the whole table in time . With this we can compute and construct an optimal solution by standard backtracking. See  for additional details. ∎
In Section 5 we will also need to find a convex clique whose convex hull has maximum perimeter. It is easy to modify the algorithm to compute, for a point , the value
and a corresponding optimal solution. Here we assume a model of computation where the length of segments can be added in constant time.
3 Probability for visibility
In this section we give a relation between and the probability that two random points in are visible. Such a relation is used later to bound the expected complexity of the visibility graph of a suitably sized random sample of points.
A polygon is weakly visible from a segment in if, for each point , there exists some point such that .
Let be a unit-area polygon weakly visible from a diagonal . Let and be two points chosen uniformly at random in . Then
and111Item (i) is not used elsewhere in this paper. However, we believe that it is an interesting fact that strengthens Theorem 10 for weakly edge-visible polygons.
Without loss of generality we assume that is a horizontal segment on the -axis. In this proof we use to denote the -coordinate of a point . Since the event that has zero probability, we may assume that and . To simplify the notation, in this proof we use .
Consider first the point fixed. We first bound the probability that and are visible and to obtain the following:
This is seen showing that the set of points satisfying and is inside a region of area at most .
We distinguish two cases:
( and are on the same side of ).
( and are on the opposite sides of ).
Let us first consider case 1). We assume that , the other case is symmetric. Refer to Figure 4a). We know that sees some point on . We may assume that does not lie on the segment as the event that lies on has zero probability. We know that sees some point on . We have a generalized polygon (in which the sides and may cross or some of the vertices may coincide) whose boundary is in , and therefore the whole interior of is also in . Here we use that has no holes. If sees , we can choose so that and share a common point: indeed, if and are disjoint, then the polygon is simple and thus sees , so we can set to . Let be the common point of and . By our assumptions, .
Let be a horizontal line through and let be the intersection between and the segment . The interior of is made of two triangles, and , both contained in and thus each of them has area at most . The triangle degenerates to a point if .
For the triangle , we have , which implies that
If the triangle is not degenerate, we have . By the similarity of the triangles and , we have , which implies that
The condition (4) implies that is inside a trapezoid of height with bases of length and , which has area . This finishes case 1).
We now consider case 2). Refer to Figure 4b). Let be the maximum subsegment of that is visible from . Since the triangle is contained in we have
If sees , then the segment intersects the segment . Thus is contained in a trapezoid of height with bases of length and . Such trapezoid has area
This finishes case 2).
Considering cases 1) and 2) together, for each fixed point we have
Since this bound holds for each fixed , it also holds when is chosen at random.
Because of symmetry we have
which proves part (i) of the theorem.
Part (ii) follows by a similar consideration using case 2) only. ∎
We can use a divide and conquer approach to obtain a bound for arbitrary polygons.
Let be an arbitrary unit-area polygon. Let and be two points chosen uniformly at random in . Then
For this proof, let us set .
For each polygon there exists a segment that splits into two polygons, each of area at most . We recursively split using such a segment in each polygon, for levels. Thus, at the bottommost level, each polygon has area bounded by .
At each level of the recursion, where , we have polygons, which we denote by . In particular, . Since the polygons at each level are disjoint, we have
For each polygon , where , let be the segment used to split . Let be the portion of that is weakly visible from . At each level we have
Let be the event . Using the union bound and part (ii) of Theorem 9 we obtain
At the bottommost level , we can use that for each to obtain