Combining Geometric and Topological Information in Image Segmentation
Abstract
A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of an object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is twofold. First, we conduct an indepth study comparing active contour and topologybased methods in a statistical framework, two popular approaches for boundary detection of 2dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures, including objects with holes and multiple objects within one image. We demonstrate this on artificiallyconstructed image datasets from computer vision, as well as real medical image data.
Keywords: Image segmentation, active contours, topological data analysis, shape analysis.
1 Introduction
1.1 The Segmentation Problem
The motivation of this paper is to study how geometric and topological information can be used together to improve existing methods in a specific, welldefined image analysis problem. Image segmentation refers to the problem of delineating the contour (or boundary) of objects within an image. This important problem has been extensively studied in the engineering, computer science, and statistics literature (Huang & Dom, 1995; Zhang, 1996; Arbelaez et al., 2011; Bryner et al., 2013). Classical statistical shape analysis methods (Kendall, 1984) and recently developed topological methods (Carlsson, 2009) can both be applied to this problem.
Intuitively, the topology and shape of the object are both important sources of information which should help guide segmentation. However, it is not completely clear at the moment on which datasets statistical shape analysis methods (Kendall, 1984; Joshi & Srivastava, 2009; Bryner et al., 2013) and topologybased methods (Paris & Durand, 2007; Letscher & Fritts, 2007; Carlsson, 2009) can both be applied. Datagenerating mechanisms and model assumptions are important for shape analysis, while topologybased methods usually consider a (static) point set at different scales, without specific consideration of the underlying datagenerating process.
In a broader sense, this paper is also an attempt to answer the open problem of comparing results from statistical procedures and topological data analysis (TDA) as asked by Wasserman (2018), by using the image segmentation problem as an example. The difficulty in conducting such a comparison is that in most statistical problems where TDA can be applied, there does not exist a precise measurement of performance. Conversely, for some wellformulated statistical problems, it is not always clear how TDA techniques can be applied. We deem the 2dimensional image segmentation problem to be a carefully crafted problem where both TDA and shape analysis will find natural applications and quantitative evaluation.
One perspective to image segmentation is to treat it as an unsupervised point clustering problem. Arbelaez et al. (2011) present a unified approach to the 2dimensional image segmentation problem and an exhaustive comparison between their proposed globallocal methods with existing classical methods. Following their insights, the image segmentation problem can be formulated as an unsupervised clustering problem of feature points. For a 2dimensional grayscale image, the dataset comes in the form of pixel positions representing the 2dimensional coordinates, and the pixel value at the same position representing grayscale values between 0 and 1 at that particular pixel location. The term feature space is used to describe the product space of the position space and pixel value space, whose coordinates are . For a grayscale image, the feature points will be clustered according to some criteria. The most straightforward way to cluster pixel points is according to their pixel value . This naive clustering method works well when the image is binary and without any noise, i.e., . One can also cluster these points according to their limit points under kernel weighted averaging, as shown in Paris & Durand (2007). Finer clustering criteria usually lead to better performance for a broader class of images. The final result is a clustering of pixel points, equivalent to grouping the points according to some criterion, e.g., their corresponding density modal sets in the binary case. As pointed out by Paris & Durand (2007), the resulting hierarchical clustering structure is intrinsically related to multiscale analysis, which can be studied by looking at level sets of pixel density functions.
A different perspective to the image segmentation problem is to treat it as a curve estimation problem, as described in the active contour literature, examples of which include work by Caselles et al. (1997); Joshi & Srivastava (2009); Bryner et al. (2013). In this setting, the goal is to estimate the boundaries of objects within images. In other words, the output is a curve in , which separates a targeted object of interest from the image background. In this approach, the boundary curve is estimated by minimizing an energy functional defined by image and auxiliary curvedependent terms (Joshi & Srivastava, 2009; Bryner et al., 2013). One of the most influential papers in this area is Kass et al. (1988), referring to active contour models as “snakes.” A typical energy functional is comprised of two terms. One uses pixel value information both inside and outside the current curve iteration to “push” regions of the curve towards important features in an image. The other term tries to balance this energy with an inherent desire for smooth object boundaries. Following this line of research, Zhu & Yuille (1996) developed the region competition approach, and Caselles et al. (1997) derived a different method also based on the idea of active contours. In this paper, we adopt the Bayesian active contour approach proposed by Joshi & Srivastava (2009) and adapted by Bryner et al. (2013). Under this approach, the energy functional is designed from training images with known segmentation, yielding a supervised method.
1.2 Clustering and Topological Data Analysis
Topological data analysis (TDA) has witnessed rapid growth in the past decades. Many interdisciplinary applications make use of topological summaries as a new characterization of the data, which provide additional insight to data analysis. TDA techniques have also been an essential part of applied research focusing on image processing, in particular, the segmentation problem (Paris & Durand, 2007; Letscher & Fritts, 2007; Gao et al., 2013; Wu et al., 2017). The topological summary is naturally endowed with a hierarchical structure that can be used in various clustering tasks, including image segmentation. Singh et al. (2007) pointed out that this clustering feature can also be used to analyze highdimensional datasets.
In the TDA pipeline given by Paris & Durand (2007), a simplicial complex can be constructed as a topological space that approximates the underlying topology of the dataset. Nested complexes called filtrations can be introduced to describe the topology of the dataset at different scales. In a filtration, we control the scale by adjusting a scale parameter. A special kind of complex (i.e., MorseSmale complex) can be constructed from the level sets of a density function , whose scale parameter is chosen to be the level parameter of level sets.
Based on the grayscale image, we can estimate its grayscale density using a (kernel) density estimator . Complexes based on superlevel sets of estimated grayscale density function of images are constructed as described above. In Paris & Durand (2007), the scale parameter is chosen to be related to level sets of the density estimator, known as a boundary persistence parameter. The topological persistence for these types of complexes is used to separate the modes of the kernel density estimator for grayscale values constructed from an image. Intuitively, the boundary persistence of a mode describes its lifetime on the chosen scale. Modes which are more “welldefined” will be more persistent. The more persistent a topological feature is in such a complex, the larger the difference between the interior and exterior grayscale density values of this level set. A highly persistent feature will indicate a sharper change of pixel density values, which more likely represents a boundary.
By varying the threshold of the boundary persistence parameter defined by a difference of values between two different modes, we can construct a hierarchical clustering structure. Then, this persistencebased hierarchy describes the topological structure of the feature space. Clustering can be performed using this hierarchical structure, and then we select those modal regions which persist long enough in terms of the boundary persistence. A region whose boundary shows sharp contrast to its neighboring regions will be picked up by their method. We will choose this as our primary approach for extracting the topological information from the image, which is also the topology of the density estimator . We also point out that there exist other competing topological methods of segmentation based on certain types of representative cycles (Dey et al., 2010; Gao et al., 2013). These methods are more closely related to the philosophy of treating contours as curves, instead of the clustering philosophy, but they have the advantage of not assuming the additional meanshift model, which will be discussed later.
1.3 Contours and Shape Analysis
As mentioned previously, image segmentation can also be formulated as a contour detection, or equivalently, a curve estimation problem (Kass et al., 1988). In this setting, one treats the contour of a target object in the image as a curve in , which minimizes some welldesigned functional. We present a brief overview of active contours and statistical shape analysis, the latter of which is necessary to understand the models proposed by Joshi & Srivastava (2009) and Bryner et al. (2013). For a comprehensive review on theory and application of shape analysis, we refer the reader to Joshi et al. (2007).
Shape can be defined as the underlying structure of an object which is invariant to certain transformations deemed shapepreserving: rigidmotion and scaling. In other words, applying any combination of translation, rotation, and scale transformations may change the appearance of the object’s contour in , but preserves its shape. Shape analysis research focuses on the representation of these objects in a way which respects these invariances, along with the development of statistical procedures for shape data. Early work in the field focused on the representation of shape by a set of finite, labeled points known as landmarks. After landmark selection, statistical modeling proceeds using traditional multivariate techniques, with mathematical adjustments to account for the structure of the data. Kendall (1984) pioneered the landmark representation, with other key work by Bookstein (1986) and Dryden & Mardia (2016).
However, in the contour detection setting, we can treat the underlying contour as a closed curve on , parameterized by (the unit circle). In the analysis of curves, the issue of registration becomes prominent. Correspondence of features across curves is dictated by curve parameterization. However, invariance of curves to parameterization is often desired, and is difficult to account for mathematically. Early work addressed this by standardizing curves to be arclength parameterized (Zahn & Roskies, 1972; Younes, 1998), which has been shown to be suboptimal for many problems (Joshi et al., 2007), as it imposes a strict, potentially unnatural correspondence of features across a collection of curves. Recent work in elastic shape analysis resolves these issues by exploiting parameterization invariance to allow for flexible reparameterizations which optimally match points between curves, commonly referred to as registration (Srivastava et al., 2011). This allows for a more natural correspondence of features and improved statistical modeling of curves and shapes.
In the active contour approach, the model treats the problem of segmentation as finding a 2dimensional curve in the (larger) region that encompasses the clusters consisting of region . A classical model from this perspective is proposed by Mumford & Shah (1989), in which the goal is to minimize the energy functional,
(1) 
over all piecewise smooth curves in , where is the curve representing the true object boundary, is the volume of the area represented by , and are tuning parameters. The hope is that this finds the “correct” contour which delineates the region (Arbelaez et al., 2011). This idea can be generalized formally as a Bayesian model which incorporates prior information (Joshi & Srivastava, 2009; Bryner et al., 2013), where the target energy functional to be minimized is now,
(2) 
where are userspecified weights which control the contribution of each term to the energy functional. The quantity contains a term which is only dependent on the image (i.e., the data). The term allows one to insert prior information about the knowledge of the shape of the region, which could have significant influence in the segmentation result, particularly if the region of interest is difficult to identify.
1.4 Organization
Our contribution in this paper is multifaceted. We first perform extensive simulation studies for artificial images to compare the topologybased method (from Section 1.2) to the active contour method (from Section 1.3). This serves as an attempt to answer the question asked by Wasserman (2018). Following this comparative study, we propose a combined method which takes advantage of the strengths of these two existing methods. Lastly, we provide insights to the choice of segmentation method based on the type of image and noise perturbation present.
The rest of the paper is organized as follows. In Section 2, we specify the topologybased (TOP) and Bayesian active contour (BAC) segmentation methods, and their underlying model assumptions. The gradientdescent algorithm for BAC can be derived analytically. Then, we state our proposed method, which we call TOP+BAC. This method combines the two aforementioned techniques, and we discuss the merits of it. Section 3 is divided into several parts. In Section 3.1, we briefly discuss different performance measures that we use to evaluate performance. We then examine performance of TOP and BAC separately, and compare to the proposed TOP+BAC using an artificial image of a single, topologically trivial object, perturbed by various types of noise. Quantitative comparison of methods (through evaluation of performance measures) and qualitative guidance as to their utility under different noise settings is provided. Following this, we examine a topologically nontrivial object, where the proposed TOP+BAC exhibits its full power. Finally, we look at skin lesion images (Codella et al., 2018) to investigate the performance of TOP+BAC for images which potentially contain multiple connected components. We conclude and discuss future work in Section 4. Supplementary Materials are available for this paper.
2 Model Specification and Implementation
In this section, we outline three different models for segmentation of an image. Section 2.1 describes the topologybased method (TOP) of Paris & Durand (2007). The method of Bayesian active contours (BAC), as described by Joshi & Srivastava (2009) and Bryner et al. (2013), is discussed in Section 2.2. This method treats segmentation as a curve estimation problem, with ideas from shape analysis used to allow incorporation of prior information. Finally, Section 2.3 outlines our proposed method, TOP+BAC, which combines TOP with BAC in a very specific way to produce a “topologyaware” segmentation.
2.1 Topological Segmentation (TOP)
In this section, we describe a topological segmentation method based on a meanshift model (Paris & Durand, 2007). This is an unsupervised clustering method which operates on sets of pixels. First, we estimate the pixel density using a kernel density estimator with Gaussian kernel . The bandwidth of this Gaussian kernel is chosen separately for the scales of the subspace of the 2dimensional pixel position coordinates and the grayscale value space. Topological persistence is examined by constructing a filtration based on the estimated pixel density . In particular, given a level parameter , we form superlevel sets of the density estimator . This filtration yields a hierarchical clustering structure.
Suppose that we know all the locations of maxima from the pixel density. Then, one can compute the boundary persistence between two local maxima , satisfying with a saddle point between these two points. This value conveys relations between modal sets around of the estimated density . Following this intuition, sets of pixels corresponding to modes with boundary persistence greater than a predetermined threshold can be collected to form a hierarchical structure of clusters, ordered by their persistence. Paris & Durand (2007) showed that the hierarchical structure provided by the boundary persistence is equivalent to the topological persistence based on level sets of the density estimator. Thus, we can proceed by clustering based on boundary persistence, rather than calculating the whole filtration of superlevel sets of density estimator .
A natural question which arises with boundary persistence is how the locations of pixel density maxima are obtained. We can extract these pixel density modes by using the meanshift algorithm, a classic method from cluster analysis. For a dataset , the sample mean within the distance of is defined to be , and the meanshift at is defined to be (Cheng, 1995). Intuitively, we first take the “average center” of those observations that are lying within a ball of the , and then compute the deviation of to this center. We can generalize these notions by replacing the flat kernel centered at with the Gaussian kernel, which we use exclusively in the rest of paper following Paris & Durand (2007). For the convenience of discussion, we define the general sample mean associated with kernel as , with weight function .
Under this definition of the general sample mean, the meanshift algorithm will iterate a point by the formula , generating a sequence of points . It can be shown that the iterative algorithm stops when . The meanshift sequence corresponds to a steepest descent on the density , where is the socalled shadow kernel associated with the kernel (Cheng, 1995). As specified above, after performing the meanshift algorithm to extract pixel density modes, we can then compute the boundary persistence, and obtain a hierarchical clustering structure by taking all modes with boundary persistence greater than the userspecified threshold value . We refer the reader to Paris & Durand (2007) for details of the algorithm used for segmentation, as they describe the meanshift model to its full extent. We will indicate the parameters of TOP as hereafter.
The resulting segmentation of TOP will be used to initialize the gradientdescent algorithm in an active contour model, which is presented in Section 2.2. This provides useful topological information and helps stabilize its result. In addition, the aforementioned active contour method can be thought of as a way to smooth the segmentation obtained using TOP.
2.2 Bayesian Active Contour (BAC)
2.2.1 Energy Functional Specification
In this section, we describe the Bayesian active contour (BAC) approach to segmentation. Active contours seek a contour curve delineating the target object from the background by minimizing an energy functional. The model proposed by Joshi & Srivastava (2009) and Bryner et al. (2013) uses the following functional:
(3) 
where are userspecified weights. Contrary to the unsupervised TOP method of Section 2.1, the supervised BAC model assumes that we have a set of training images with known object boundaries, and a set of test images on which we perform segmentation.
The image energy term, , is given by the following expression,
where denotes the pixel value at location , and denotes the interior and exterior regions of the curve, respectively. The quantities are kernel density estimates of pixel values for the interior and exterior of the true contour, respectively. In Joshi & Srivastava (2009) and Bryner et al. (2013), these densities are estimated from training images, for which true contours are known. From these, one can use a histogram (if the image is truly binary) or kernel density estimator (for general grayscale images) for pixel values that are found inside and outside the true contour, provided that the test image resembles training images in terms of pixel composition. Intuitively, we may interpret this image energy term as describing how sharp the contrast in the image is between the interior and exterior regions of a single curve .
The smoothing energy term, , for contour is given by,
where is the Jacobian of the curve at pixel coordinates . This quantity is approximated numerically, given the discrete nature of an image. In general, the more wiggly is, the larger will be. Thus, depending on the weight prescribed to this term, one can control the desired smoothness of the solution.
The final term, , is the primary contribution of Joshi & Srivastava (2009) and Bryner et al. (2013). This term quantifies the difference between the contour and the mean shape of training curves, and the choice of controls how much the contour is pushed towards this training mean shape. Suppose we have ground truth contours from training images denoted by , where and (for closed curves). Then, the prior energy term is,
where is a vector pointing from the shape of the contour to the sample mean shape of the known contours from training images, , are eigenvector and eigenvalue matrices, respectively, describing the various directions of shape variation among the ground truth contours, and is selected to be less than the smallest eigenvalue in . This term is equivalent to the negative loglikelihood of a truncated wrapped Gaussian density on the space of shapes, where ensures this density is welldefined. Explicit details about this term’s construction are found in Section 1 of the Supplementary Materials. Use of a shape prior can help in segmentation of images where the object is not fully observable, for instance, in the presence of occluded boundaries or low signaltonoise ratio. If training data is available, then this term allows one to downweight information provided by pixel values to push the contour towards the desired shape.
2.2.2 GradientDescent
The standard way to minimize the energy functional in Equation 3 is via a gradientdescent algorithm. Given current contour , we update by moving along the negative gradient of as follows:
Updates proceed sequentially until the energy term converges (i.e., the rate of change of the total energy is below some prespecified convergence tolerance), or a maximum number of iterations is reached. This requires specification of an initial contour , which can be chosen in many different ways. Unfortunately, as gradientdescent is only guaranteed to find local extrema, final segmentations can be extremely sensitive to the choice of initialization curve .
The gradient terms are given by:
where is the outward unit normal vector to , is the curvature function for , is a function which depends on the sample mean shape of training contours , and controls the step size (typically chosen to be small) of the update from towards the mean shape. At a particular point along the contour , the image update moves this point inward along the normal direction if , meaning that this current image location is more likely to be outside the contour than inside, with respect to the estimated pixel densities and . If the opposite is true, then the point on the contour is updated outward along the normal direction, “expanding” the contour. The smoothness update uses curvature information to locally smooth the curve. Without the action of other energy updates, this term will push any contour towards a circle. Finally, the prior update term aims to evolve the current contour a small amount in the direction of the sample mean shape of the training data. Hereafter, we will indicate the parameters of BAC as . The explicit form for in the shape prior is found in Section 1 of the Supplementary Materials; we also refer readers to Joshi et al. (2007); Srivastava et al. (2011); Kurtek et al. (2012) and Srivastava & Klassen (2016) for further details.
2.3 Bayesian Active Contour with Topological Initialization (TOP+BAC)
As noted in Section 2.2, Bayesian active contour requires a gradientdescent algorithm with initiallyspecified contour . Since gradientdescent searches for local extrema, there is no guarantee that the final contour represents the target of interest. In addition, many images feature objects with more complex topological structure (e.g., objects with multiple connected components and/or holes within). We propose a new approach, Bayesian active contour with topological initialization (TOP+BAC), as a combination of the methods from Sections 2.1 and 2.2. This addresses both issues:

TOP can provide an automatic way of initializing a single contour, in the hope that the initializationsensitive, gradientdescent algorithm converges efficiently to correct target.

TOP can be used to initialize multiple active contours simultaneously, for object boundaries which cannot be represented by a single curve due to its topological nature.
TOP+BAC can be thought of as first using TOP to provide a rough segmentation which is able to identify topological features of interest, followed by refinement of those results using BAC in order to provide a finer segmentation. In other words, the topological method provides an initial point for gradientdescent. This allows for increased interpretability and consistency in terms of topological properties.
Currently, it appears that there is not an agreed generalization of active contour methods for multipleobject images with nontrivial topological structures (e.g., an image with numerous donuts). Using the proposed twostep method, we can extend most current active contour methods to multipleobject images, as TOP provides a natural partition of an image into several regions, where only one object lies in each region. We can also segment a single topologically complex object by performing separate active contour algorithms on components identified by the topological method. In addition, by treating each topological feature as a separate curve, one can then incorporate prior information in BAC and control smoothness separately for each curve, which cannot be achieved by using TOP alone. If an interpretable segmentation which respects the underlying topological structure is of utmost importance, TOP+BAC will enhance individual results from TOP or BAC.
3 Data Analysis
In this section, we apply different methods to several simulated images, as well as real data. We begin by looking at a binary image of an object with a single connected component and no loops, the most simple topological structure. We demonstrate that BAC, TOP, and TOP+BAC all identify the true boundary contour very well in this situation, as expected. In practice, images are usually contaminated with some noise. Thus, we follow with an investigation into how these methods perform when the original binary image is contaminated with several types of noise. Then, we extend the simulation to an object with more complex topological structure, and show that TOP+BAC is necessary to correctly identify the boundary in this setting. We conclude with a study of the performance of TOP+BAC on skin lesion data (Codella et al., 2018), which features images with multiple connected components.
3.1 Performance Evaluation Measures for the Segmentation Problem
To evaluate the performance of the segmentation result, we need to choose appropriate performance evaluation measures that compare our segmentation with the true segmentation of the image, known as the ground truth (Abdulrahman et al., 2017). As mentioned earlier, based on the method applied, the output of the contour comes in two different forms: TOP yields clusters of pixel points, whereas BAC and TOP+BAC will produce contours representing the boundaries of the segmented regions. A comprehensive introduction of several performance evaluation measures has been given by Zhang (1996). Most of these measures (Zhang, 1996; Monteiro & Campilho, 2006; Abdulrahman et al., 2017) assume the segmentation result and truth come in the form of clusters, as is the case with TOP. In this setting, the objects under comparison are two collections of regions consisting of pixel points. In order to apply these measures to BAC and TOP+BAC, we convert the contour curves into pixel clusters in images by taking each curve’s interior to be a pixel cluster using MATLAB. If we simply want to compare the contours of the segmentation clusters instead, then an appropriate measure should be defined on the space of all possible curves.
We choose to use the following performance measures to evaluate segmentation results: the Hausdorff distance , normalized Hamming distance defined in Huang & Dom (1995), the Jaccard index metric (Jaccard, 1901; Späth, 1981), and the statistical performance measure PM suggested by Abdulrahman et al. (2017). Finally, since the result of the proposed TOP+BAC algorithm is an estimated contour, we can compare this to the true contour’s shape by computing their elastic shape distance defined in Srivastava et al. (2011), explicitly stated in Section 1 of the Supplementary Materials. For topologically nontrivial objects with boundaries represented by multiple contours, we can simply compute the elastic shape distance between corresponding estimated and ground truth contours separately.
3.2 Simulation Studies for Object with Trivial Topological Features
3.2.1 Binary Image with No Noise
To begin, consider bone images selected from the MPEG7 dataset, referenced by Bai et al. (2009), among others. In this setup, we start with the contours of 20 bone curves (represented by points each), and construct binary images with the interior of each bone in white, and the exterior in black. To examine the performance of contour detection under the models discussed in Section 2, we select one image as the test image, leaving the remaining 19 as training images. Since the images have been constructed based on the underlying bone contours, we have ground truth segmentations for both training and test sets, thus allowing us to quantify performance of the aforementioned algorithms by comparing the ground truth to estimated segmentations using different performance measures. For binary images, we can estimate the interior and pixel densities using a histogram estimator.
Figure 1 shows the final segmentation under the BAC, TOP and TOP+BAC approaches. For BAC, we selected , and (i.e., no shape prior update). For TOP, we empirically selected (i.e., select all topological features with boundary persistence greater than 5). We argue that a shape prior is not necessary for a simple binary image, since it is clear what the target object is to be segmented. Choice of the update parameters is subjective and specific to the image. If there is no prior shape information and there is a clearly identified object in the image, we suggest setting , allowing the image and smoothness terms to guide updates of the active contour. The image update parameter can be chosen somewhat subjectively, especially for a simple binary image. Choosing to be large can potentially induce selfintersections of the contour (see Section 2 in Supplementary Materials), particularly in regions of high curvature. We have found that selecting between 0.15 and 0.5 is generally appropriate for many of the images we have examined. The smoothing parameter is chosen to be relatively small, as the boundary is already fairly smooth here. This can, however, be adjusted by the user to ensure a final contour which does not have many small bumps.
As stated earlier, the BAC algorithm requires an initialization of the contour, and the result can be highly sensitive to this specification. Thus, for BAC, we handselected the contour which is shown in red on the leftmost panel of the Figure 1. However, TOP+BAC uses the TOP result to initialize the contour. In this case, the method works well, as the result of the meanshift algorithm is a contour which is essentially right about where the true boundary occurs (as shown in the middle and right panel of Figure 1), and TOP correctly identifies the objects’s topological structure. In this setting, TOP may be the best tool of use, as BAC methods rely on gradient updates, which can introduce slight numerical errors to the estimated contour. TOP+BAC is not necessary, as the image features a topologicallytrivial object, and TOP by itself segments this object well. The left panel of Table 1 shows performance measures from Section 3.1 evaluated for the segmentation results using BAC, TOP, and TOP+BAC. Note that for all measures, the results are quite comparable.
(i) No noise  (ii) Gaussian perturbation  (iii) Gaussian blur  

Measure  BAC  TOP  TAC  BAC  TOP  TAC  BAC  TOP  TAC 
4.6904  3.1623  3.8730  4.6904  3.7417  4.1231  4.6904  5.5678  4.6904  
0.0031  0.0013  0.0037  0.0030  0.0015  0.0030  0.0046  0.0077  0.0046  
0.0204  0.0092  0.0241  0.0216  0.0110  0.0222  0.0324  0.0518  0.0320  
PM  0.0251  0.0106  0.0290  0.0249  0.0126  0.0248  0.0359  0.0582  0.0358 
0.0985  0.0223  0.0919  0.1162  0.0323  0.0888  0.0699  0.1315  0.0716 
3.2.2 Binary Image with Binary Noise
Next, we add two different types of binary noise to the binary bone image. First, consider adding (binary) saltandpepper noise (Marques, 2011). This is done in MATLAB using the function (with a prespecified noise density, for which we select its parameter to be ), perturbing the image by adding black pixels to the interior, and white pixels to the exterior (as in the left side of Figure 3). Since the resulting image is still binary, we can still perform image updates by estimating interior and exterior pixel densities using the histogram estimator.
Unfortunately, we cannot use an automatic topological initialization via TOP when there is a large amount of noise, due to the fact that the resulting boundary image features numerous small boundaries due to the image noise. We generally hope to initialize TOP+BAC by the largest connected component (in terms of the area contained within the component) from TOP, but this is not feasible. Thus, we recommend using BAC over TOP in this setting. One should note that the amount of this type of noise is quite rare in real image applications, particularly in the background of a scene.
Instead of perturbing image pixel values, we can also perturb the underlying contour by simply adding a bivariate meanzero Gaussian random number with standard deviation 3 to each column (corresponding to Euclidean coordinates) of the vector representing this contour. This creates a jagged bone from the smooth bone contour (see Figure 2). However, since the image is still binary and has no noise, we expect BAC, TOP, and TOP+BAC to perform equally well. This is the case, as illustrated in Figure 2, with identical BAC settings and TOP settings . Corresponding performance measures are found in the middle portion of Table 1.
Both of the examples shown in Figure 3 can be used to illustrate the crucial effect that the prior term can have on gradient updates in active contour algorithms. In the image with saltandpepper noise, judicious use of prior shape knowledge can help drive the update towards a suitable contour in many fewer iterations than without a shape prior. This is because the prior term only involves the contour information from ground truth segmentations of training images, without any regard for the signaltonoise ratio within the image itself. This is shown in the left panel of Figure 3, where we now set . While the targeted bone does not perfectly match the shape in the image, we obtain a biased estimate which ignores the noise to some degree. On the right panel, we illustrate the idea that use of the prior can also help average out the random noise along the boundary when segmenting the test image, producing a smooth final contour despite results from TOP or traditional active contour methods a jagged contour. A decision of whether to use the prior term is applicationspecific. For most images, we suggest setting , particularly when a topological initialization is used. However, if the object is not fully observable in the image, whether due to binary noise along the boundary or occluded features as remarked by Bryner et al. (2013), use of the prior term can be beneficial to segmentation.
3.2.3 Binary Image with NonBinary Noise
Next, we look at adding nonbinary noise to the original binary bone image, as this is a more common source of noise for images arising from standard applications. This means that these new grayscale images will take continuous values between 0 and 1, rather than only 0 or 1. To accommodate this, we now use kernel density estimates, rather than histogram estimates, of interior and exterior pixel densities in order to perform updates using the image energy term. We consider adding two types of noise: Gaussian blur (using MATLAB function ) and Gaussian pixel noise (using the MATLAB function ).
First, in Figure 4, we consider the binary bone image polluted by Gaussian blur, which makes boundary identification more challenging. Using the same settings for TOP and BAC as the previous examples, we note that BAC and TOP perform well on their own, and that TOP+BAC helps to smooth out some of the small bumps that are visible in the TOP segmentation. Since TOP treats segmentation as a pixel clustering problem, the resulting contour may not be as smooth as desired, depending on the specified parameters. In addition, for this image with blur, varying the TOP parameters can either lead to results which capture the bone perfectly, or estimates a larger bone than the ground truth. In this case, using this as an initialization for TOP+BAC allows one to refine the segmentation towards the true contour by using the estimated pixel densities from training images. The right panel of Table 1 shows performance measures for this example.
We could also consider the same image contaminated by Gaussian pixel noise. Under the same settings for TOP, we observe similar behavior to the saltandpepper noise example from Section 3.2.2, where a topological initialization is not available due to the signaltonoise ratio in the image. This means a single connected component is difficult to identify from the TOP boundary map used to initialize TOP+BAC. Thus, with a large amount of nonbinary noise, we do not recommend use of TOP or TOP+BAC. Similar to Figure 3, one can set in BAC to help the contour update with less emphasis on the added pixel noise, if some bias in the estimate is deemed acceptable.
3.3 Simulation Studies for Objects with NonTrivial Topological Features
In the previous simulations, we have looked at a binary bone image which has been contaminated with various types of noise. In some settings, TOP is hopeless while BAC renders itself somewhat useful, while in other cases, TOP performs as well or slightly better than BAC. However, in all of these cases, a topological initialization for active contour algorithms does not seem to result in much improvement. This is because the underlying object in the image is trivial in a topological sense (i.e., one connected component with no hole). To explore the true potential of the TOP+BAC method, we now consider artificiallyconstructed donut images, which have a more complex topological structure due to the presence of two boundaries and one hole in the middle (see Figure 5).
3.3.1 Binary Image with No Noise
First we consider a binary donut image as the test image, which has been artificially constructed from two curves, so that the ground truth boundaries are known. In order to estimate pixel densities, we form 9 more binary donut images as training images, constructed from sets of ground truth curves. Each donut consists of a large, outer boundary, and a smaller, inner boundary, and we assume that the labels of ground truth curves (i.e., outer or inner boundary) from the training images are known. If this is the case, we estimate the interior and exterior pixel densities and for (outer contour curve) and (inner contour curve) marginally.
We first demonstrate the inability for BAC to correctly account for the topological structure of this donut image. To do so, we initialize a single curve within the white region of the donut image, and apply the usual BAC algorithm, with parameters . Figure 5 shows the final contour obtained from the same initial contour, where the only difference is which estimated pixel densities are used (i.e., corresponding to the outer or inner boundary). Note that both capture one of the intended boundaries. However, the contour is not able to split into two separate contours, in order to capture both the outer and inner boundaries simultaneously. This can be misleading from an interpretability perspective, and is a drawback to current active contour methods.
Measure  (i) No noise  (ii) Gaussian perturbation  (iii) Gaussian blur 

3.6056  4.3589  7.2111  
0.0025  0.0033  0.0436  
0.0185  0.0247  0.2365  
PM  0.0194  0.0261  0.2567 
Outer  0.0431  0.0962  0.0360 
Inner  0.0805  0.1471  0.0632 
The right panel of Figure 5 shows the result of using a topological segmentation to initialize. We easily identify two boundary curves which correspond to the outer and inner boundaries of the donut. TOP+BAC then runs separate active contour algorithms simultaneously and independently, using their respective estimated pixel density functions (since the ground truth segmentations are also known to be comprised of inner and outer boundaries). Similar to the bone example, the topological initialization is quite close to the true boundary. The benefit of the proposed method is the ability to identify the two boundaries, rather than simply specifying two separate initializations due to a priori knowledge about the topological structure of the donut. The left panel of Table 2 shows performance measures applied to the final segmentation formed by the enclosed portion of the two estimated contours. All of these values are comparable to those which were seen in the bone example of Table 1, indicating that TOP+BAC has performed well.
3.3.2 Binary Image with Noise
In the previous section, we have seen the benefit of TOP+BAC for images with nontrivial topological features. To wrap up the simulation study, we also test performance of TOP+BAC on the types of artificial noise for which a topological initialization is feasible. In the study of the bone image, we found that Gaussian blur and Gaussian contour perturbations still allow for an easily interpretable estimated boundary from the topological meanshift algorithm. Figure 6 illustrates the two estimated boundaries under both of these types of noise. In both cases, two boundaries are easily identified from the topological initialization.
Interestingly, in the blurred donut image, the contours appear to expand slightly out past what one would expect to be the true contour; this is caused by the use of the individuallyestimated pixel density functions for each feature, in conjunction with the grayscale pixel values. This is reflected in the performance measures shown in the middle and right panels in Table 2. While the shape is close to the truth, the actual coverage as compared to true interior pixels is a bit off, which explains larger values in all performance measures except the elastic shape distance. The right panel of the same table shows similar results for a Gaussian perturbed contour. Note that shape distance is inflated due to the added Gaussian noise in the true boundary; many of the other measures are also slightly inflated due to the added noise.
3.4 Real Data Evaluation
The method of TOP+BAC is not only applicable to images of a single object with complex topology, but also provides a powerful way to handle images with multiple objects. In order to demonstrate the benefit of our proposed method for images with multiple objects, we conduct segmentation on the ISIC1000 skin lesion dataset from Codella et al. (2018). The input data are dermoscopic lesion images in JPG format. The lesion images are taken from a historical sample of patients presented for skin cancer screening. These patients were reported to have a wide variety of dermatoscopy types, from many anatomical sites. A subset of these images have ground truth segmentations, which were reviewed and accepted by a human expert.
It is important to note that all of the “true” segmentations were identified as one connected component with no holes. However, there are many lesion images that consist of multiple welldefined connected components, thus making TOP+BAC beneficial for this task. Practically, it is of medical importance to identify whether a lesion consists of multiple smaller regions or one large region (Takahashi et al., 2002). We selected two images featuring benign nevus lesion diagnoses. The first image, shown as (a) in Figure 7, appears to have two separate connected components corresponding to the lesion areas. In order to apply TOP+BAC, we require a training set of images to estimate interior and exterior pixel densities. To do this, we select nine other benign nevus skin lesion images with an identified ground truth segmentation. We note that these segmentations are marked as one connected component with no holes, leading to a mismatch with our suspected belief about the topological structure of the test image. To remedy this, we estimate just one set of pixel densities and , as opposed to seperate densities for each connected component. In addition, we use TOP to initialize BAC.
(a)  (b)  

Figure 7(a) shows the result of TOP+BAC applied to one of the benign nevus lesions. The left panel shows the result of the topological meanshift algorithm. Note that there are some extraneous boundaries, but the largest two coincide with the two separate connected components. One should note that while there are additional connected components in those regions, they are identified by MATLAB’s function as being contained inside the outermost boundaries, and thus, we are able to easily extract the two outer boundaries. If we initialize with these contours, and run two separate active contours using weights , we obtain the final segmentations on the right side of the figure after 300 iterations. We justify setting by the quality of the ground truth, since some of the handdrawn true segmentations are much coarser than others. In addition, the prior term requires shape registration, which can be challenging in the presence of noisy boundaries, as is the case with these skin lesions. It is interesting to note that while two connected components are easily identifiable visually from the image, the two active contours resulting from TOP+BAC in terms of physical distance are actually very close to each other due to the lack of separation. Changing the bandwidth of the kernel used to estimate interior and exterior pixel densities can help address this issue (Section 3 of the Supplementary Materials).
Figure 7(b) shows a second benign nevus image, again using nine training images to estimate pixel densities. Under the same settings as the other example in this section, we use TOP to initialize two active contours. In order to initialize around the two regions of interest, we can use TOP to cycle through connected components until we have selected two which are in the approximately correct regions. Note that in this case, the two regions are very wellseparated, and thus, we do not run into the same phenomenon of slightly overlapping regions due to the gradient updates being applied separately for each contour. This is clearly a case in which a single active contour will not represent this lesion appropriately, whereas TOP may identify too many spurious boundaries which do not correspond to the actual target of interest (the rash).
4 Conclusion
4.1 Discussion
The motivation of this work is to attempt to combine two kinds of information (i.e. geometry of shapes and topology) in image segmentation when they are both available. It also attempts to address the critical comment made by Wasserman (2018): “…no effort was made to compare these (TDA) methods to simpler, more traditional statistical methods”.
In the first half of our paper, we briefly reviewed two different philosophies of image segmentation problem, identifying TOP (treating segmentation as a clustering problem via meanshift) and BAC (treating segmentation as a curve estimation problem) as representative methods for each. TOP (Paris & Durand, 2007) is a topologybased method, which can also be used for clustering, while BAC (Joshi & Srivastava, 2009; Bryner et al., 2013) is a method which combines traditional active contour segmentation approaches with shape analysis. We evaluated their performances, both qualitatively and quantitatively using various performance measures (Huang & Dom, 1995; Monteiro & Campilho, 2006; Arbelaez et al., 2011). Based on this systematic study, we described their respective advantages and drawbacks in different scenarios and provided detailed suggestions.
In the second half of our paper, we propose a method that combines TOP and BAC. Both methods individually require parameter tuning, and more often than not, are very sensitive to this choice. TOP generally yields topologicallyconsistent but overly conservative, nonsmooth contours and includes many noisy regions. BAC is not necessarily “topologyaware” and is highly dependent on gradientdescent algorithm initialization, but provides rather smooth contours and is not as sensitive to noise. The proposed TOP+BAC method uses the segmentation result of TOP as an initialization of the BAC. This twostep method can also be thought of as using BAC to “smoothout” the result of TOP, with initial curves provided for each topological feature to allow for independent BAC algorithms to be run. We illustrate that topological information can be used to choose a good starting point of the gradientdescent algorithm. We support the use of our TOP+BAC method by demonstrating its performance using both artificial and real data. For artificial data, we show that TOP+BAC can identify the complex topological structure of the image subject correctly, which enhances the interpretability of the segmentation result. For skin lesion diagnosis, TOP+BAC correctly identifies multiple objects (i.e., connected components) in the image, which can be of practical importance in medical scenarios (Takahashi et al., 2002).
4.2 Future Work
As mentioned above, this work is a primitive attempt to address the question asked by Wasserman (2018) in the setting of image segmentation. There is a lot more to explore following the question discussed therein. One possible direction for future work is the problem of uncertainty quantification of estimated boundaries. To our knowledge, in the setting of multiple objects and a single object with nontrivial topological features, not much work has been done on this. One particular difficulty is that uncertainty arises both in how the boundary is estimated, as well as the topological properties of the boundary. For example, if we draw a confidence band surrounding the segmented contour of two discs, their confidence bands may intersect and create a single large connected component. In addition, BAC is extremely sensitive to numerical issues of gradientdescent when initialized poorly, so there are scenarios in which it will not converge at all, or the output results in the formation of loops on the boundary (see Section 2 in the Supplementary Materials). Thus, it is not straightforward to characterize this change of topological features when quantifying uncertainty. Another issue is that the BAC model (Bryner et al., 2013) is not fully Bayesian, and it produces a point estimate (i.e., one contour). Therefore, it is not obvious how to generalize this method to obtain a full posterior distribution for the segmentation and yield natural uncertainty quantification. To address this issue, we believe that a general, fully Bayesian model should be proposed and studied in the future.
Some interesting ideas arise from the segmentation of Figure 7(a). Note that TOP+BAC yielded two contours which overlapped. This behavior is a result of estimation of pixel densities from the training images, as many of the true segmentations of the skin lesion images are somewhat distant from the portion of the lesion which is darkest. This may be due to human expert bias during segmentation. One may inherently be conservative when drawing a boundary surrounding the lesion, in an attempt to capture the entire lesion and avoid missing a key part of it. Since the active contour algorithm depends on the training data to estimate pixel densities, the final segmentation will also reflect this behavior. In addition, because the ground truth segmentations feature only one connected component, the interior and exterior densities will primarily concentrate at dark and light pixel values, respectively. Finally, the expansion of the two contours to include lighter pixels is also partly due to the lack of separation between the two features. If the two connected components are wellseparated (e.g., Figure 7(b)), this behavior will not occur in the gradientdescent algorithm. To address this issue, one idea is to develop a conditional algorithm, i.e., use a topological segmentation to initialize the largest feature, and run BAC on this feature. Once completed, then BAC can be used on the next largest feature, constrained to the subregion which removes the final contour from the first feature. This is left as future work.
As already pointed out by Gao et al. (2013), there are other alternatives in TDA to handle topological information present in pixel images. In this paper, we use the TOP method by Paris & Durand (2007) which is a representative topological method which focuses on the pixel density estimator for grayscale images, but can be generalized to color images. To the best of our knowledge, there has not been any attempt in the literature to compare the performance and computational costs of different topological methods, or even combine topological methods for such a specific problem. We believe it is of great interest to carry out such a comprehensive study for topological methods in the image segmentation problem.
4.3 Acknowledgments
We would like to acknowledge and thank YuMin Chung for pointing out the skin lesion dataset Codella et al. (2018) to us. We would also like to thank Abhijoy Saha and Sebastian Kurtek for their enthusiastic help and comments on the manuscript.
References
 Abdulrahman et al. (2017) Abdulrahman, Hasan, Magnier, Baptiste, & Montesinos, Philippe. 2017. From contours to ground truth: How to evaluate edge detectors by filtering. Journal of WSCG, 25(2), 133–142.
 Arbelaez et al. (2011) Arbelaez, Pablo, Maire, Michael, Fowlkes, Charless, & Malik, Jitendra. 2011. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5), 898–916.
 Bai et al. (2009) Bai, Xiang, Yang, Xingwei, Jan Latecki, Longin, Liu, Wenyu, & Zhuowen, Tu. 2009. Learning context sensitive shape similarity by graph transduction. IEEE Transactions in Pattern Analysis and Machine Intelligence, 32(5), 861–874.
 Bookstein (1986) Bookstein, F. L. 1986. Size and Shape Spaces for Landmark Data in Two Dimensions. Statistical Science, 1(2), 181–222.
 Bryner et al. (2013) Bryner, Darshan, Srivastava, Anuj, & Huynh, Quyen. 2013. Elastic shapes models for improving segmentation of object boundaries in synthetic aperture sonar images. Computer Vision and Image Understanding, 117(12), 1695–1710.
 Carlsson (2009) Carlsson, Gunnar. 2009. Topology and data. Bulletin of the American Mathematical Society, 46.2, 255–308.
 Caselles et al. (1997) Caselles, Vicent, Kimmel, Ron, & Sapiro, Guillermo. 1997. Geodesic active contours. International Journal of Computer Vision, 22.1, 61–79.
 Cheng (1995) Cheng, Yizong. 1995. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.
 Codella et al. (2018) Codella, Noel CF, Gutman, David, Celebi, M Emre, Helba, Brian, Marchetti, Michael A, Dusza, Stephen W, Kalloo, Aadi, Liopyris, Konstantinos, Mishra, Nabin, Kittler, Harald, et al. 2018. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). Pages 168–172 of: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE.
 Dey et al. (2010) Dey, Tamal K., Sun, Jian, & Wang., Yusu. 2010. Approximating loops in a shortest homology basis from point data. In: Proceedings of the Twentysixth Annual Symposium on Computational Geometry. ACM.
 Dryden & Mardia (2016) Dryden, I. L., & Mardia, K. V. 2016. Statistical Shape Analysis: with Applications in R. Second edn. New York: Wiley.
 Gao et al. (2013) Gao, Mingchen, et al. 2013. Segmenting the papillary muscles and the trabeculae from high resolution cardiac CT through restoration of topological handles. In: International Conference on Information Processing in Medical Imaging.
 Huang & Dom (1995) Huang, Qian, & Dom, Byron. 1995. Quantitative methods of evaluating image segmentation. Pages 53–56 of: Proceedings., International Conference on Image Processing, vol. 3. IEEE.
 Jaccard (1901) Jaccard, Paul. 1901. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat, 37, 241–272.
 Joshi et al. (2007) Joshi, S. H., Klassen, E., Srivastava, A., & Jermyn, I. H. 2007. A Novel Representation for Riemannian Analysis of Elastic Curves in . Pages 1–7 of: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
 Joshi & Srivastava (2009) Joshi, Shantanu, & Srivastava, Anuj. 2009. Intrinsic Bayesian active contours for extraction of object boundaries in images. International Journal of Computer Vision, 81(3), 331–355.
 Kass et al. (1988) Kass, M., Witkin, A., & Terzopoulos, D. 1988. Snakes: active contour models. International Journal of Computer Vision, 1, 321–331.
 Kendall (1984) Kendall, D. G. 1984. Shape Manifolds, Procrustean Metrics, and Complex Projective Shapes. Bulletin of London Mathematical Society, 16, 81–121.
 Kurtek et al. (2012) Kurtek, S., Srivastava, A., Klassen, E., & Ding, Z. 2012. Statistical Modeling of Curves Using Shapes and Related Features. Journal of the American Statistical Association, 107(499), 1152–1165.
 Letscher & Fritts (2007) Letscher, David, & Fritts, Jason. 2007. Image segmentation using topological persistence. Pages 587–595 of: International Conference on Computer Analysis of Images and Patterns. Springer.
 Marques (2011) Marques, Oge. 2011. Practical image and video processing using MATLAB. John Wiley & Sons.
 Monteiro & Campilho (2006) Monteiro, Fernando C, & Campilho, Aurélio C. 2006. Performance evaluation of image segmentation. Pages 248–259 of: International Conference Image Analysis and Recognition. Springer.
 Mumford & Shah (1989) Mumford, David, & Shah, Jayant. 1989. Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics, 42(5), 577–685.
 Paris & Durand (2007) Paris, Sylvain, & Durand, Frédo. 2007. A topological approach to hierarchical segmentation using mean shift. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition.
 Singh et al. (2007) Singh, Gurjeet, Mémoli, Facundo, & Carlsson, Gunnar E. 2007. Topological methods for the analysis of high dimensional data sets and 3d object recognition. SPBG.
 Späth (1981) Späth, H. 1981. The minisum location problem for the Jaccard metric. OperationsResearchSpektrum, 3(2), 91–94.
 Srivastava et al. (2011) Srivastava, A., Klassen, E., Joshi, S. H., & Jermyn, I. H. 2011. Shape Analysis of Elastic Curves in Euclidean Spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1415–1428.
 Srivastava & Klassen (2016) Srivastava, Anuj, & Klassen, Eric P. 2016. Functional and Shape Data Analysis. SpringerVerlag.
 Takahashi et al. (2002) Takahashi, K, Kobayashi, S, Matui, R, Yamaguchi, S, & Yamashita, K. 2002. The differences of clinical parameters between small multiple ischemic lesions and single lesion detected by diffusionweighted MRI. Acta neurologica scandinavica, 106(1), 24–29.
 Wasserman (2018) Wasserman, Larry. 2018. Topological data analysis. Annual Review of Statistics and Its Application, 5, 501–532.
 Wu et al. (2017) Wu, Pengxiang, et al. 2017. Optimal topological cycles and their application in cardiac trabeculae restoration. In: International Conference on Information Processing in Medical Imaging. Springer, Cham.
 Younes (1998) Younes, L. 1998. Computable elastic distance between shapes. SIAM Journal of Applied Mathematics, 58(2), 565–586.
 Zahn & Roskies (1972) Zahn, C. T., & Roskies, R. Z. 1972. Fourier Descriptors for Plane Closed Curves. IEEE Transactions on Computers, 21(3), 269–281.
 Zhang (1996) Zhang, Yu Jin. 1996. A survey on evaluation methods for image segmentation. Pattern Recognition, 29(8), 1335–1346.
 Zhu & Yuille (1996) Zhu, Song Chun, & Yuille, Alan. 1996. Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 9, 884–900.