On non-parametric density estimation on linear and non-linear manifolds using generalized Radon transformsSubmitted to the editors DATE. \fundingThis work was funded by Center for Applied Brain and Cognitive Sciences (CABCS) at Tufts University. Shuchin Aeron was supported in part by the NSF CAREER award.

# On non-parametric density estimation on linear and non-linear manifolds using generalized Radon transforms††thanks: Submitted to the editors DATE. \fundingThis work was funded by Center for Applied Brain and Cognitive Sciences (CABCS) at Tufts University. Shuchin Aeron was supported in part by the NSF CAREER award.

James Webber Dept. of ECE, Tufts University ()    Erika Hussey Center for Applied Brain and Cognitive Sciences (CABCS) ()    Eric Miller Dept. of ECE, Tufts University ()    Shuchin Aeron Dept. of ECE, Tufts University ()
###### Abstract

Here we present a new non-parametric approach to density estimation and classification derived from theory in Radon transforms and image reconstruction. We start by constructing a “forward problem” in which the unknown density is mapped to a set of one dimensional empirical distribution functions computed from the raw input data. Interpreting this mapping in terms of Radon-type projections provides an analytical connection between the data and the density with many very useful properties including stable invertibility, fast computation, and significant theoretical grounding. Using results from the literature in geometric inverse problems we give uniqueness results and stability estimates for our methods. We subsequently extend the ideas to address problems in manifold learning and density estimation on manifolds. We introduce two new algorithms which can be readily applied to implement density estimation using Radon transforms in low dimensions or on low dimensional manifolds embedded in . We test our algorithms performance on a range of synthetic 2-D density estimation problems, designed with a mixture of sharp edges and smooth features. We show that our algorithm can offer significantly improved performance when compared to the state–of–the–art Kernel Density Estimators (KDE) in some cases.

R
\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersNon-parametric density estimationJ. Webber, E. Hussey, E. Miller and S. Aeron

adon Transforms, Density Estimation, Inverse Problems.

{AMS}

68Q25, 68R10, 68U05

## 1 Introduction

Motivation: Consider Figure 1, a point cloud that is obtained by IID sampling from an unknown density (in this case , but we consider the theoretical aspects of the general dimensional case in this paper). A principal goal in machine learning is to estimate the distribution function from a given finite sample, under the assumption that is smooth [15, 8, 14, 18, 20]. The primary contribution of this paper is the development and analysis of methods for recovering from such a sample set using ideas from the theory of Radon transforms. More specifically, Radon theory provides for a highly flexibly family of algorithms for stably estimating from projections of the empirical data set onto low dimensional manifolds such as affine subspaces, spheres, and the like. See figure 1 for an approximation of a half space Radon transform projection using the empirical distribution function (by observation counting in half spaces). Moreover, unlike more common density estimation methods, the Radon-based technique is naturally adapted to densities, which may not be smooth. The prediction of one dimensional densities is well studied, from a theoretical standpoint [30] and in implementation. Indeed there are many effective techniques which can be applied with good results in the one dimensional case [8, 15, 18, 20]. It is well known that the problem of reconstructing a density (of a certain type) from its Radon transform , given enough projection data, is mildly ill–posed [6, page 42], and hence we can expect a small amplification in the error from a prediction of in the reconstruction of from . By combining known approximation techniques for one dimensional density estimation and reconstruction methods in image reconstruction and inverse problems, we propose new techniques in non-parametric density estimation, for which we can derive convergence estimates and demonstrate the effectiveness of our methods in implemenation on a variety of challenging synthetic density estimation problems.

## 2 Relevant literature

The classical and still the most-widely used methods for non-parametric density estimation are based on a Kernel Density Estimate (KDE) [15, 8, 14, 1], which approximates the unknown density by its convolution with a known density function (e.g. isotropic Gaussians). Other methods in non-parametric density estimation consider the reconstruction of the density by maximizing the log–likelihood [18, 20] with additional smoothness regularization on the density . However note that in these works the method was not extended and tested to handle higher dimensional data and also considerations of density estimation over a low-dim manifold, which is a central theme in modern machine learning, were not undertaken. Extensions of the KDE approach to density estimation on manifolds (embedded in high dimensions) has been considered in [4, 3, 2].

The approaches based on KDE can fail to capture precise details in the density (e.g. edge effects) due to smoothing, particularly if the bandwidth is too large. To alleviate this issue the use of the hyperplane Radon transform for density estimation has been considered in the literature [21, 22, 23]. In [22] the hyperplane Radon transform is approximated by the application of a smoother to empirical density functions and the filtered backprojection formula is used to reconstruct Gaussian mixture densities in two dimensions. In [21] it is shown that when the Radon transform is approximated by a series of one dimensional kernels, the reconstruction is a type of convolution kernel estimate. Using this idea new kernel density estimators are derived. In [23] a new approach to Gaussian mixture model fitting is presented which uses one dimensional projections of the target density, which are described by the Radon transform, to give a more stable convergence with random initializations than the Expectation Maximization (EM) algorithm.

## 3 Main contributions

This paper advances the use of Radon transforms for non-parametric density estimation in the following novel directions:

• We use new types of empirical distribution functions to approximate generalized Radon transform projections of the density. Here we consider spherical and half space Radon transforms, whereas hyperplane transforms are only considered in previous work. This allows us to elegantly derive error estimates for our approach and provides the basis for a new, fast method in density estimation, which is completely non-parametric.

• While the current methods use the explicit inversion formula for the hyperplane Radon transform (a filtered backprojection approach), we discretize generalized Radon transform operators on pixel grids (assuming a piecewise constant density) and use existing machinary for solving regularized inverse problems with Total Variation (TV) regularization. We choose TV as it is a powerful regularizer commonly applied image reconstruction which favours piecewise smooth solutions [27, 28]. The idea of using TV for density estimation has so far been only generally applied to the 1-D case when using the likelihood approach [18, 20]. A 2-D example is considered in [20] but we extend on the bivariate case significantly and introduce a variety of 2-D test examples in this paper.

• We derive convergence estimates for our approach by combining the theory of Radon transforms [6, page 94] and results on the estimation of one dimensional densities by empirical distributions as described by the Dvoretzky–Kiefer–Wolfowitz inequality [30]. In [22] convergence estimates are given for a hyperplane approximation if certain assumptions are met. For assumption (c) in 5.1.2. in [22] to hold based on the suggested approximation using truncated Fourier series, we require knowledge of the Sobolev norms , where are the (unknown) projections to be approximated. Here we give estimates for half space and spherical approximators only assuming a degree of smoothness for the underlying distribution.

• Somewhat in a vein similar to extending KDE methods to density estimation on manifolds, [4, 3, 2], we extend the Radon idea from Euclidean space to density estimation on manifolds embedded in by reconstructing the density on local tangent spaces. This combines the theory of Radon transforms with the theory of manifold sampling and reconstruction, taking inspiration from a number of methods in non linear dimensionality reduction, such as Locally Linear Embedding [25] and Local Tangent Space Alignment (LTSA) [29, 5]. The main idea is to embed the data locally into low-dimensions using these methods and then use the proposed density estimation approach on these patches.

• We present two new algorithms which can be readily applied to implement our density estimation techniques in low dimensional Euclidean space and on low dimensional manifolds embedded in . The codes are available from the authors on request.

## 4 Organisation of the paper

This paper is organized as follows. In section 5 we state some mathematical preliminaries and theory on Radon transform, and explain the intuitive microlocal idea behind edge detection and reconstruction using Radon transforms. We then introduce our main idea, where we propose to use empirical distribution functions derived from Independent and Identically Distributed (IID) point clouds to approximate Radon transforms of the probability density.

In section 6 we provide error and convergence estimates for our approach by combining theory from geometric inverse problems [6, page 94] and empirical distribution function estimators [30]. We show that the convergence rate of the least squares error in our approximate density is when using the half space Radon transform, where is the number of observations and is the dimension. We also give estimates for the expected error in a spherical Radon transform approximation and explain the increase in error with , the number of projections taken and the confidence level.

In section 7 we present simulated density reconstructions in two dimensions using an algebraic approach with TV regularization, and give a side by side comparison with a Gaussian kernel estimator. The algebraic approach implemented is formalized in Algorithm 1 for density estimation by inverse Radon transform in low dimensional Euclidean space. Later in section 7.2 we extend our ideas from Euclidean space in low dimensions to density estimation on low dimensional manifolds embedded in , using theory and techniques in manifold learning [5, 24, 25].

## 5 The main idea and approach

In this section we state some mathematical preliminaries and theory on Radon transforms, and explain briefly their inversion and edge detection properties. We then introduce our main idea for using empirical distribution functions of IID point clouds to approximate Radon transforms of the probability density.

### 5.1 Mathematical preliminaries and Radon Transforms

Throughout this paper will denote a point could in and will denote the density from which is drawn. We also denote:

1. as the unit cylinder in .

2. as the space of continuously differentiable functions compactly supported on .

3. as the unit ball in .

4. as the norm for functions on the cylinder.

5. and as the volume and surface area of a sphere radius in respectively.

###### Definition 1 ([6, page 9])

Let . Then the hyperplane Radon transform of is defined as

 Rf(s,θ)=∫Rnf(x)δ(x⋅θ−s)dx, (1)

where denotes the Dirac delta function.

###### Definition 2

Let . Then the spherical (volumetric) Radon transform of is defined as

 RMf(s,x)=∫s0rn−1∫Sn−1f(x+rθ)dθdr. (2)

Here the set of spheres in are parameterized by , where is the radius and is the center of the sphere. After differentiating with respect to the variable and dividing by the transform is equivalent to the spherical means transform defined in [9].

###### Definition 3

Let . Then the half space Radon transform of is defined as

 RHf(s,θ)=∫s−∞∫Rnf(x)δ(x⋅θ−r)dxdr, (3)

where denotes the Dirac delta function.

Here the set of hyperplanes in are parameterized by . See figure 2. After differentiating with respect to the variable the transform reduces to the classical hyperplane Radon transform .

It is well known [7, page11, Theorem 2.5], that the hyperplane (and hence half space) Radon transforms have an explicit left inverse and are injective for . Similarly for the spherical Radon transform we can derive explicit reconstruction formulae for the density [9, 10, 11], and the solution is unique if enough spherical centers and radii are known (e.g. for any fixed radius , if all centers are known, the reconstruction is obtained by spherical deconvolution or deblurring). The previous works of [22] apply the formula [7, page11, Theorem 2.5] directly in density estimation. Our reconstruction method discretizes the linear operator (or ) on voxel grids and solves the resulting set of sparse linear equations with regularization.

Radon transforms are known to have edge detection properties [12, 13]. If we know the integrals of over a set of hypersurfaces tangent to a direction in a small neighborhood of a given , then we can detect jump singularities in near in the direction (this forms the intuition behind the theory of microlocal analysis applied to Radon transforms). See figure 3. We can see that the jump discontinuities in the density are present in the derivatives (finite differences) of Radon transform (either spherical or half space) approximations, but the projections are noisy. We will see later that we can combat this noise while effectively preserving edges in the density by a discrete inversion of the spherical or half space Radon transform with a Total Variation (TV) penalty. TV is a powerful regularizer commonly applied in image reconstruction which favours piecewise smooth solutions.

### 5.2 Application to density estimation: The core idea

Let be a point cloud in drawn as IID samples from a density . Then the half space Radon transform of can be approximated as the set of one dimensional empirical distribution functions

 RHf(s,θ)=∫s−∞∫Rnf(x)δ(x⋅θ−r)dxdr≈∑i∈{xi⋅θ≤s}1. (4)

That is, for every fixed , the projection defined by (which is a one dimensional cumulative density function) can be approximated by the empirical distribution function of the projection of to the line [22]. Similarly for the spherical Radon transform we have

 RMf(s,x)=∫s0rn−1∫Sn−1f(x+rθ)dθdr≈∑i∈{(xi−x)2

So by counting the number of observations which lie in a set of half spaces or spheres we can approximate Radon transforms of the density . See figure 4.

We aim to discretize the Radon transform operators above to a pixel grid (assuming a piecewise constant density as is commonly applied in image reconstruction and inverse problems) and solve the resulting sparse system of linear equations to recover the density. Here we will make use of widely applied regularization techniques (such as TV) to combat the noise due to finite sampling, and apply known automated methods in tomography to choose the regularization parameter (such as the Generalized Cross Validation (GCV)). First we provide theoretical performance garuantees for our method, in the next section.

## 6 Error estimates

Here we give error estimates for our method for density estimation combining results on the stability of inverse Radon transforms and the expected error of empirical distribution functions. The next result is a statement of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality [30] which explains the error in an approximation of a univariate density from its empirical distribution function.

We now have our main theorem which gives bounds for the error in a half space Radon transform approximation from a full set of projected one dimensional empirical distribution functions.

###### Theorem 1

Let be continuous, independant and identically distributed random variables in with probability density function and let

 gm(s,θ)=1m∑i∈{xi⋅θ≤s}1

be an approximation to the half space Radon transform . Let projection directions be uniformly spread over . Then

 ∥g−gm∥2L2(Z)≤w1,n−1log2Kpm+ϵ(K) (6)

with probability for any , where and and

 ϵ(K)=∣∣ ∣∣∥g−gm∥2L2(Z)−K∑j=1wj∥gθj−gm,θj∥2L2(R)∣∣ ∣∣

is the error term in approximating the integral over by a Reimann sum, where the area elements are such that and is the surface area of .

###### Proof

The proof is deferred to section A.

###### Corollary 1

Let be continuous, independant and identically distributed random variables in with probability density function and let

 gm(s,θ)=1m∑i∈{xi⋅θ≤s}1

be an approximation to the half space Radon transform . Then the worst case error in a reconstruction of from is

 ∥f−fm∥L2(Rn)≤c(n)(w1,n−1log2Kpm+ϵ(K))12(n+2)ρ1−1/(n+2), (7)

which holds with probability . Here is the number of projections (which are assumed to be uniformly distributed on the sphere), , and is an error term in approximating the integral over by a finite sum.

###### Proof

The result follows immediately from Theorem 1 and Theorem 12.

Regarding the above results and the error term , the idea here is that we can pick large enough (indeed we have no restriction (in theory) on the number of projections taken) to imply a stable inversion (or such that ). That is, we can choose so that the inverse problem of reconstructing from (known for a finite number of projections) is (effectively) mildly ill–posed, and such that there are no image artefacts, in the sense that the singularities of are stably recovered in all directions [12, 13] (hence the choice of uniformity on for the set of directions ).

Let (we expect the number of projections needed to adequately cover to increase exponentially with ) be large enough so that the error is negligible. For example we could fix for an angular step size of on . Then

 ∥g−gm∥2L2(Z)≤w1,n−1log2kn−1pm=w1,n−1m((n−1)log(k)+log(2p)) (8)

with probability . The bound above explains how we would expect the error in our Radon transform approximation to increase with , and the confidence . For fixed , and we expect the convergence rate to the solution of with .

The next theorem describes the least squares error expected in a spherical Radon transform approximation using empirical distribution functions.

###### Theorem 2

Let be continuous, independant and identically distributed random variables in with probability density function and let

 gm(s,x)=1m∑i∈{(xi−x)2≤s2}1

be an approximation to the spherical Radon transform . Then

 1KK∑j=1∥gxj−gm,xj∥2L2(R+)≤1m(log(K)+log(2p)) (9)

with probability , for any finite set of circle centers. Here and .

###### Proof

The proof is deferred to section A. See Theorem 7.

So we would expect the same rate of decrease in the error in a spherical Radon transform approximation as with half space data. The spherical Radon transform is mildly ill–posed and to the same degree as the half space transform if enough spherical centers and radii are known (this can be shown using the theory presented in [9]). So we would expect the same amplification in the error in our solution. In general, if we approximate Radon transform projections (e.g. ellipsoidal, hyperbolic) as empirical distribution functions, we can expect the same rate of convergence, provided that the Radon transforms used have a stable inverse.

## 7 Method and results

Here we present practical methods for density estimation in low dimensions by a discrete inversion of Radon transforms. We test our approach on a number of synthetic densities in two dimensions and give a side by side comparison with a kernel estimator. Later in section 7.2 we expand our approach to density estimation on manifolds and provide an error analysis of the reconstructions of densities on manifold patches of varying levels of curvature and radius.

There are two algorithms presented in this section, which we will now introduce. Algorithm 1 implements density estimation in low dimensions by an explicit, discrete inversion of Radon transforms with regularization. Algorithm 2 applies Algorithm 1 to local tangent spaces of manifolds to reconstruct densities on low dimensional manifolds embedded in .

### 7.1 Density estimation in low dimensions

To implement our density estimation method in low dimensional () Euclidean space we will approximate Radon transform operators as discrete sums over a uniform grid (which would typically represent image pixels in image reconstruction). Specifically we aim to minimize the following functional

 ∥Rv−b∥22+λ2G(v), (10)

where is the discrete form of the Radon transform (either spherical or half space) and is a regularization penalty with regularization parameter . For example (Tikhonov) or (total variation). To solve the least squares problem (10) we apply the iterative solvers of [19], and choose the regularization parameter via a Generalized Cross Validation (GCV) method [32, page 95]. To estimate densities in low dimensions we implement Algorithm 1.

To show the effectiveness of our method we will give a side by side comparison with a state–of–the–art non-parametric density estimator which smooths the target density by a convolution with a kernel function

 f(x)=1mhm∑i=1J(x−xih), (11)

where is the bandwidth and the are IID samples from the target density. Here we will fix the kernel as an isotropic Gaussian (we will compare with a Gaussian kernel estimator) and choose the bandwidth to be optimal for Gaussian densities [8, page 31]. We automate the choice of both and so that the density estimators are truly non-parametric (in the sense that there are no hyperparameters), and to give a fair comparison between a kernel and Radon approach.

We consider four test densities, whose probabilty density functions are displayed in the left hand of figures 58. See figure 11 for a larger view and description of the ground truth densities. See also figure 12 for example IID samples taken from densities 1–4. These will be used to approximate Radon transform data (with the sample sizes as indicated in figure 12) for the reconstructions given in this section. All densities considered are piecewise smooth, some with more edges and more smooth features than others. We use these as example densities to test the effectiveness of our reconstruction method with TV for recovering the jump singularities in the density while also preserving smoothness. Each of the test densities displayed are to be reconstructed on a pixel grid (the density is assumed to be piecewise constant with densities, the matrix has columns). For half space Radon transform data, we sampled and as and to be perfectly uniform on , as is most optimal by Theorem 13, and in line with the theory presented in the previous section. For spherical Radon transform data, we sample circle centers and radii . Counting observations in circles of smaller radii (), while being a more noisy approximation, helps to highlight finer features in the density, whereas the larger radii () offer less precise information but with less noise. We find that using a range of radii is most optimal and overall gives better performance than using half space data (see appendix D for a full comparison), which is what we would expect as spherical Radon transform data is overdetermined.

Our method has better performance overall and is less prone to oversmoothing than the Gaussian kernel approach. See table 1 for a comparison of the errors using our approach (either spherical of half space) and a kernel estimate. Let be the vectorized form of the original density, and let denote an approximation. Then we measure the error by

 ϵ=100×∥v−vm∥2∥v∥2. (12)

There were 12 density estimations conducted in total (as indicated by table 1). We present 8 of which in the main text. See figures 58. For the remaining reconstructions and full comparison of spherical vs half space (as well as some additional analysis) consult appendix D.

We notice that in particular the jump singularities (edges) in the original density are better recovered by reconstruction with TV than with a kernel estimate. This is as expected and the improvement is most pronounced in the reconstruction of density 3. Even in the case of a Gaussian mixture model (density 1), where a Gaussian kernel is a-priori the optimal choice, the kernel density estimation again appears oversmoothed when compared to our method. When we reconstruct density 4 from half space data, we notice a significant increase in the error when compared to the density reconstructions 1–3 (see table 1). This is due to image artefacts appearing as a singularity on the image edge (see appendix D). This highlights a potential downfall of our approach using half space data due to, in this case, severe aftefacts appearing in the reconstruction. This introduces a large increase in the error and a kernel estimate is preferred in this case (with an improvement in the error of ). The use of spherical Radon transforms however, offer consistently improved performance over the other methods considered and there are no significant artefacts in the reconstructions presented. Based on this experiment, a spherical Radon approach is optimal.

Other methods in the literature focus on fitting a piecewise linear (or constant) density function which maximizes the log likelyhood function and a TV or penalty is applied as regularization [18, 20]. While this has been shown to produce a good peicewise smooth approximation to the density in the univariate case, the multivariate cases are not explored in much depth in the theory or simulation and implementation. Hence we choose to compare our method to a kernel estimator, given their wider application and use in multivariate density estimation.

### 7.2 Density estimation on low dimensional manifolds

Here we extend our previous results to density estimation on low dimensional manifolds embedded in , where we will incorporate our previous reconstruction methods and theory, the theorems of [5] and ideas in manifold learning. First some definitions. Let be a metric space. Throughout this section we denote to be the ball of radius centerd at and for .

We now have the theorem from [5] which gives bounds for the error in a local tangent space approximation of a manifold in terms of its principle curvatures.

###### Theorem 3

Let be a submanifold of whose principle curvatures are bounded by . Then

 dH(BMr(x),BTxMr(x))≤12κr2 (13)

for any and , where and denotes the Hausdorff distance.

See figure 9 for a visualisation of a set .

So far we have dealt with density estimation in low dimensions, where the true data dimension is known ( in the above simulations), in the sense that the observation sets lie on 2–manifolds in . In many machine learning applications however, the true data dimension is hidden and and we are given a set of observations , where . It is often assumed that the observations lie on a manifold which is embedded in . More precisely for every , , where is an embedding. This is the idea behind the field of “manifold learning”.

Let be an –manifold and let our observation set . Let us assume that the data we are given is , where is an embedding in the general topological sense (a homeomorphism onto its image), with . Let be described by a collection of charts , where are open sets which cover and each is a homeomorphism of to the unit ball in . Let be a density on . Then is the representation of in dimensions. If the charting is known, then we can recover densities on using Radon transforms.

More precisely, for every , there exists an such that , and the function can be reconstructed explicitly from its spherical or half space Radon transform. Once the function is known we can compose it with to recover the corresponding density value for . For a finite sample on the error in the reconstruction of a patch is described by Corollary 1. This is under the assumption of course that the charting of is known, which is not often the case. We aim to tackle this problem in this section.

#### 7.2.1 Local tangent space approximations

Let be an atlas of an –manifold embedded in , with , and let each be a neighborhood of a point for some . By Theorem 3, we can approximate by the tangent patch , provided the principle curvatures of are low or is small enough. This is an idea used in techniques in manifold learning and non-linear dimensionality reduction [5, 29]. For finite data we will take subsamples of the data (to represent a sample on ), for small enough, and project onto the first principle components. This approximates the projection of onto (or the application of the chart ). So here the atlas which describes is such that each is approximately flat. After we have transformed the samples to principle component space, and let us denote the projected subsample by , we can reconstruct the density from which is drawn (this would be , using the notation above) by applying Algorithm 1. The reconstruction in the last step therefore determines an approximation to the density on the neighborhood (or patch) . Precisely, we will implement Algorithm 2 estimate densities on low dimensional manifolds embedded in .

To demonstrate this consider figure 10, where we have embedded an IID sample of density 1 (, where ) into three dimensions via

 ϕ(x,y)=(x,y,κ(x2+y2)/2), (14)

so is the representation of in three dimensions and . Here is a bound for the principle curvatures of the manifold at the origin. Note that a direct application of Algorithm 1 here would be ill advised. Indeed if we extend to be zero for , then (the support of has measure zero in ). In this case ( and ) the curvature of is low on the patch ( is located at the center of the patch) highlighted and the data dimension is recovered correctly (), setting . After transforming (a density sample on ) to principle component space we can reconstruct the density on by applying Algorithm 1. See table 2 for the relative errors in reconstructions of the density on for a range of and values (keeping the and coordinates of the center of constant). Here we measure the relative error using equation (12) as in section 7. We see that the relative error increases with the curvature . This is as expected by Theorem 3 as the local tangent space approximations of the manifold become less accurate. We can attempt to deal with this by reducing the ball radius , to counter balance the increase in the upper bound on the error in local tangent space approximations for increased . However a reduction in also serves to the lower the sample size on (see table 5), and hence the reconstruction error is further amplified (by Corollary 1), which is evident from the figures in table 2. See also tables 3 and 4 for the errors in the reconstruction of the density on using the half space Radon transform and a Gaussian kernel estimator.

## 8 Conclusions and further work

Here we have presented a new approach to non-parametric density estimation inspired by ideas and theory in image reconstruction and Radon transforms. Using classical theory in geometric inverse problems we derived error estimates for our approach in section 6. This combined the classical theory of [30] with that of Natterer in [6], where we showed that the expected convergence rate to the solution was , where is the number of observations and is the dimension. We then went on to develop new density estimation methods in section 7 from known numerical reconstruction algorithms in image reconstruction and inverse problems. The implementation of which was described in Algorithm 1. In particular we found that a discrete inversion of spherical and half space Radon transforms with a total variation regularizer produced satisfactory piecewise smooth approximations to four example densities (with varying amounts of edges and smooth features) in low dimensions (), and offered a significantly improved performance over a Gaussian kernel estimate in some cases. In particular, our method was less prone to oversmoothing than a Gaussian kernel estimate, and was more optimal in recovering singularities in the density as well as smooth features.

We extended our methods to reconstruct densities on manifolds which are embedded in in section 7.2, using ideas in manifold learning and the theory of [5]. The implementation is described by Algorithm 2, which is an extention of Algorithm 1 to local tangent spaces of manifolds. We gave an analysis of the errors in density reconstructions on manifold patches for varying levels of the principle curvature (at the origin) and patch radius . Our analysis was consisitent with the theory of [5], and the higher curvature manifolds proved more troublesome for an accurate density estimation. To deal with this we propose to apply methods in non-linear dimensionality reduction [24, 26, 25] to better “unravel” the manifold patches of higher curvature and compare the performance to a linear approximation as is presented here.

We note that the current work is only suited to density estimation on low dimensional manifolds (mainly due to the practical restrictions of discretizing Radon transform operators in high dimensions). To deal with this, in further work we aim to develop a filtered backprojection approach for high dimensional data. As the filtering process described by an explicit inversion from hyperplane Radon transform data is ill–posed (and this is more pronounced in high dimensions), it is desired to develop a more stable filtering process that is effective in edge reconstruction, while offering little amplification in the noise. For spherical Radon transforms we aim to apply the formulae derived by John [31, page 82] for expressing densities in terms of their iterated spherical means, which again will require sufficient regularization.

Throughout this paper, we have chosen to focus on the use of empirical distribution functions to approximate spherical and half space Radon transform data in this paper. For further work we aim to use more advanced methods in density estimation (such as kernel estimators and those in [18, 20]) to more accurately estimate the projection densities of Radon transforms, and test for an improvement in the results presented here.

## References

• [1] Fukunaga, Keinosuke, and Larry Hostetler. “The estimation of the gradient of a density function, with applications in pattern recognition.” IEEE Transactions on information theory 21, no. 1 (1975): 32-40.
• [2] Ozakin, Arkadas, and Alexander G. Gray. “Submanifold density estimation.” In Advances in Neural Information Processing Systems, pp. 1375-1382. 2009.
• [3] Henry, Guillermo, and Daniela Rodriguez. “Kernel density estimation on Riemannian manifolds: Asymptotic results.” Journal of Mathematical Imaging and Vision 34, no. 3 (2009): 235-239.
• [4] Kuleshov, Alexander, Alexander Bernstein, and Yury Yanovich. “High-Dimensional Density Estimation for Data Mining Tasks.” In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pp. 523-530. IEEE, 2017.
• [5] Fefferman, Charles, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. “Reconstruction and interpolation of manifolds I: The geometric Whitney problem.” arXiv preprint arXiv:1508.00674 (2015).
• [6] Natterer, Frank. The mathematics of computerized tomography. Vol. 32. Siam, 1986.
• [7] Natterer, Frank, and Frank Wübbeling. Mathematical methods in image reconstruction. Vol. 5. Siam, 2001.
• [8] Bowman, Adrian W., and Adelchi Azzalini. Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Vol. 18. OUP Oxford, 1997.
• [9] Cormack, A. M., and E. T. Quinto. “A Radon transform on spheres through the origin in and applications to the Darboux equation.” Transactions of the American Mathematical Society 260, no. 2 (1980): 575-581.
• [10] Haltmeier, Markus, and Sunghwan Moon. “The spherical mean Radon transform with centers on cylindrical surfaces.” arXiv preprint arXiv:1501.04326 (2015).
• [11] Haltmeier, Markus. “Exact reconstruction formula for the spherical mean Radon transform on ellipsoids.” Inverse Problems 30, no. 10 (2014): 105006.
• [12] Krishnan, Venkateswaran P., and Eric Todd Quinto. “Microlocal analysis in tomography.” Handbook of mathematical methods in imaging (2015): 847-902.
• [13] Rullgard, Hans, and Eric Todd Quinto. “Local Sobolev estimates of a function by means of its Radon transform.” Inverse Probl. Imaging 4, no. 4 (2010): 721-734.
• [14] Liu, Chang, Jingwei Zhuo, Pengyu Cheng, Ruiyi Zhang, Jun Zhu, and Lawrence Carin. “Accelerated First-order Methods on the Wasserstein Space for Bayesian Inference.” arXiv preprint arXiv:1807.01750 (2018).
• [15] Sheather, Simon J., and Michael C. Jones. “A reliable data-based bandwidth selection method for kernel density estimation.” Journal of the Royal Statistical Society. Series B (Methodological) (1991): 683-690.
• [16] Jebara, Tony. “Multi-task feature and kernel selection for SVMs.” In Proceedings of the twenty-first international conference on Machine learning, p. 55. ACM, 2004.
• [17] Jaakkola, Tommi, Marina Meila, and Tony Jebara. “Maximum entropy discrimination.” In Advances in neural information processing systems, pp. 470-476. 2000.
• [18] Sardy, Sylvain, and Paul Tseng. “Density estimation by total variation penalized likelihood driven by the sparsity information criterion.” Scandinavian Journal of Statistics 37, no. 2 (2010): 321-337.
• [19] Gazzola, Silvia, Per Christian Hansen, and James G. Nagy. “IR Tools: A MATLAB Package of Iterative Regularization Methods and Large-Scale Test Problems.” arXiv preprint arXiv:1712.05602 (2017).
• [20] Koenker, Roger, and Ivan Mizera. “Density estimation by total variation regularization.” In Advances in Statistical Modeling and Inference: Essays in Honor of Kjell A Doksum, pp. 613-633. 2007.
• [21] Panaretos, Victor M., and Kjell Konis. “non-parametric construction of multivariate kernels.” Journal of the American Statistical Association 107, no. 499 (2012): 1085-1095.
• [22] O’Sullivan, Finbarr, and Yudi Pawitan. “Multidimensional density estimation by tomography.” Journal of the Royal Statistical Society. Series B (Methodological) (1993): 509-521.
• [23] Kolouri, Soheil, Gustavo K. Rohde, and Heiko Hoffmann. “Sliced Wasserstein Distance for Learning Gaussian Mixture Models.” arXiv preprint arXiv:1711.05376 (2017).
• [24] Tenenbaum, Joshua B., Vin De Silva, and John C. Langford. “A global geometric framework for non-linear dimensionality reduction.” science 290, no. 5500 (2000): 2319-2323.
• [25] Roweis, Sam T., and Lawrence K. Saul. “non-linear dimensionality reduction by locally linear embedding.” science 290, no. 5500 (2000): 2323-2326.
• [26] Shawe-Taylor, John, and Nello Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004.
• [27] Gholami, Ali, and S. Mohammad Hosseini. “A balanced combination of Tikhonov and total variation regularizations for reconstruction of piecewise-smooth signals.” Signal Processing 93, no. 7 (2013): 1945-1960.
• [28] Hansen, Per Christian, and Jakob Heide Jørgensen. “Total Variation and Tomographic Imaging from Projections.” In 36th Conference of the Dutch-Flemish Numerical Analysis Communities. 2011.
• [29] Zhang, Zhenyue, and Hongyuan Zha. “Principal manifolds and non-linear dimensionality reduction via tangent space alignment.” SIAM journal on scientific computing 26, no. 1 (2004): 313-338.
• [30] Dvoretzky, Aryeh, Jack Kiefer, and Jacob Wolfowitz. “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator.” The Annals of Mathematical Statistics (1956): 642-669.
• [31] John, Fritz. Plane Waves and Spherical Means Applied Equations. Interscience Publishers, 1955.
• [32] Hansen, Per Christian. Discrete inverse problems: insight and algorithms. Vol. 7. Siam, 2010.
• [33] Casella, George, and Roger L. Berger. Statistical inference. Vol. 2. Pacific Grove, CA: Duxbury, 2002.
• [34] Adams, Robert A., and John JF Fournier. Sobolev spaces. Vol. 140. Elsevier, 2003.

## Appendix A Proof of Theorems 1 and 2

Here we provide proof of Theorems 1 and 2. First some preliminary results. The next result is a statement of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality [30] which explains the error in an approximation of a univariate density from its empirical distribution function.

###### Theorem 4 (DKW theorem)

Let be continuous, independant and identically distributed random variables in with cumulative distribution function . Let

 Fm(x)=1mm∑i=1I[xi,∞)(x)

be the empirical distribution function for , where is the indicator function of a set . Then

 P(∥Fm−F∥L∞(R)≥ϵ)≤2e−2mϵ2 (15)

for any .

Next we state Bonferroni’s inequality [33, page 13].

###### Theorem 5 (Bonferroni inequality)

Let be a finite set of events in a probability space , so that . Then

 P(∩Ki=1Ai)≥K∑i=1P(Ai)−(K−1). (16)

We now restate Theorem 1 and provide proof thereafter.

###### Theorem 6

Let be continuous, independant and identically distributed random variables in with probability density function and let

 gm(s,θ)=1m∑i∈{xi⋅θ≤s}1

be an approximation to the half space Radon transform . Let projection directions be uniformly spread over . Then

 ∥g−gm∥2L2(Z)≤w1,n−1log2Kpm+ϵ(K) (17)

with probability for any , where and and

 ϵ(K)=∣∣ ∣∣∥g−gm∥2L2(Z)−K∑j=1wj∥gθj−gm,θj∥2L2(R)∣∣ ∣∣

is the error term in approximating the integral over by a Reimann sum, where the area elements are such that and is the surface area of .

###### Proof

For every , is the cumulative distribution function of the one dimensional density , where . The approximation is the empirical distribution function of . After rearranging equation (15) the DKW theorem gives

 P⎛⎜⎝∥gθ−gm,θ∥L∞(R)≤√log2p2m⎞⎟⎠≥1−p (18)

for any probability and . From which it follows that

 ∥gθ−gm,θ∥2L2(R)=∫1−1(gθ(s)−gm,θ(s))2ds≤2∥gθ−gm,θ∥2L∞(R)≤log2pm (19)

with probability . Given projections we have

 ∥gθj−gm,θj∥2L2(R)≤log2Kpm (20)

with probability , for all by the Bonferroni inequality. Let the be distributed uniformly over . Then we have

 ∥g−gm∥2L2(Z)=∫Sn−1∥gθ−gm,θ∥2L2(R)dθ≤K∑j=1wj∥gθj−gm,θj∥2L2(R)+ϵ(K)≤log2KpmK∑j=1wj+ϵ(K)=w1,n−1log2Kpm+ϵ(K) (21)

with probability , where is the error in approximating the integral in the first step above by a finite sum, and the are small area elements of , where .

We now restate Theorem 2 and provide proof of the result.

###### Theorem 7

Let be continuous, independant and identically distributed random variables in with probability density function and let

 gm(s,x)=1m∑i∈{(xi−x)2≤s2}1

be an approximation to the spherical Radon transform . Then

 1KK∑j=1∥gxj−gm,xj∥2L2(R+)≤1m(log(K)+log(2p)) (22)

with probability , for any finite set of circle centres. Here and .

###### Proof

For each , let us extent and to the real line by defining for . Then is the empirical distribution function of and we have

 ∥gxj−gm,xj∥2L2(R)≤2∥gxj−gm,xj∥2L∞(R)≤log2Kpm (23)

with probability , for all by the DKW theorem and Bonferroni’s inequality. The result follows.

## Appendix B Results on the stability of Radon tranforms

Here we provide background theory and results regarding the stability of inverse Radon transforms. First we have some definitions.

###### Definition 4

Let . Then we define the Fourier transform of

 ^f(ξ)=(2π)−n/2∫Rnf(x)e−ix⋅ξdx. (24)

From [34, page 59] we have the definitions of Sobolev spaces.

###### Definition 5

Let . Then we define the Sobolev spaces of degree

 Hα(Rn)={tempered distributions f:(1+|ξ|2)α/2^f(ξ)∈L2(Rn)} (25)

with the norm

 ∥f∥2HRn=∫Rn(1+|ξ|2)α|^f(ξ)|2dξ (26)

for functions on the cylinder , if the norm

 ∥g∥2Hα(Z)=∫Sn−1∫R(1+σ2)α|^g(σ,θ)|2dsdθ (27)

exists and is finite. Here the Fourier transform is taken with respect to the variable .

Let be an open subset of . Then we define

 Hα(Ω)={f∈Hα(Rn):supp(f)⊆¯Ω}. (28)

Next from [6, page 11] we have

###### Theorem 8 (Fourier slice theorem)

Let . Then

 ˆRf(σ,θ)=(2π)(n−1)/2^f(σθ), (29)

where the Fourier transform is taken with respect to the variable.

From Natterer [6, page 42], we have:

For each