Bayesian Density Estimation via Multiple Sequential Inversions of 2D Images with Application in Electron Microscopy
Authors’ affiliations:
Associate Research Fellow in Department of Statistics,
University of Warwick, Coventry CV4 7AL, U.K., corresponding author,
Lecturer of Statistics, Department of Mathematics, University of Leicester, Leicester LE1 7RH, U.K.,
Research Biostatistics Group Head, Novartis Vaccines
and Diagnostics and Associate fellow at Department of Statistics,
University of Warwick,
Graduate student at Emerging Technologies Research
Centre, De Montfort University,
Lecturer, Department of Physics, University of Warwick,
Head, Emerging Technologies Research Centre, De
Montfort University
Abstract
We present a new Bayesian methodology to learn the unknown material density of a given sample by inverting its twodimensional images that are taken with a Scanning Electron Microscope. An image results from a sequence of projections of the convolution of the density function with the unknown microscopy correction function that we also learn from the data. We invoke a novel design of experiment, involving imaging at multiple values of the parameter that controls the subsurface depth from which information about the density structure is carried, to result in the image. Reallife material density functions are characterised by high density contrasts and typically are highly discontinuous, implying that they exhibit correlation structures that do not vary smoothly. In the absence of training data, modelling such correlation structures of real material density functions is not possible. So we discretise the material sample and treat values of the density function at chosen locations inside it as independent and distributionfree parameters. Resolution of the available image dictates the discretisation length of the model; three models pertaining to distinct resolution classes are developed. We develop priors on the material density, such that these priors adapt to the sparsity inherent in the density function. The likelihood is defined in terms of the distance between the convolution of the unknown functions and the image data. The posterior probability density of the unknowns given the data is expressed using the developed priors on the density and priors on the microscopy correction function as elicitated from the Microscopy literature. We achieve posterior samples using an adaptive MetropoliswithinGibbs inference scheme. The method is applied to learn the material density of a 3D sample of a nanostructure, using real image data. Illustrations on simulated image data of alloy samples are also included.
Keywords: Bayesian methods, Inverse Problems, Nonparametric methods, Priors on sparsity, Underdetermined problems
1 Introduction
Nondestructive learning of the full threedimensional material density function in the bulk of an object, using available two dimensional images of the object, is an example of a standard inverse problem (?; ?; ?; ?; ?). The image results from the projection of the three dimensional material density onto the image plane. However, inverting the image data does not lead to a unique density function in general; in fact to render the inversion unique, further measurements need to be invoked. For instance, the angle at which the object is imaged or viewed is varied to provide a series of images, thereby expanding available information. This allows to achieve ditinguishability (or identifiability) amongst the solutions for the material density. A reallife example of such a situation is presented by the pursuit of material density function using noisy 2dimensional (2D) images taken with electron microscopy techniques (?). Such noninvasive and nondestructive 3D density modelling of material samples is often pursued to learn the structure of the material in its depth (?), with the ulterior aim of controlling the experimental conditions under which material samples of desired qualities are grown.
Formally, the projection of a density function onto a lower dimensional image space is referred to as the Radon Transform; see ?, ?. The inverse of this projection is also defined but requires the viewing angle as an input and secondly, involves taking the spatial derivative of the density function, rendering the computation of the inverse projection numerically unstable if the image data is not continuous or if the data comprises limitedangle images (?; ?) or if noise contaminates the data (?). Furthermore, in absence of measurements of the viewing angle, the implementation of this inverse is not directly possible, as ? suggested. Even when the viewing angle is in principle measurable, logistical difficulties in imaging at multiple angles result in limitedangle images. For example, in laboratory settings, such as when imaging with Scanning Electron Microscopes (SEMs), the viewing angle is varied by remounting the sample on stubs that are differently inclined each time. This mounting and remounting of the sample is labourintensive and such action leads to the breaking of the vacuum within which imaging needs to be undertaken, causing long waiting periods between two consecutive images. When vacuum is restored and the next image is ready to be taken, it is very difficult to identify the exact fractional area of the sample surface that was imaged the last time, in order to scan the beam over that very area. In fact, the microscopist would prefer to opt for an imaging technique that allows for imaging without needing to break the vacuum in the chamber at all. Such logistical details are restrictive in that this allows imaging at only a small number of viewing angles, which can cause the 3D material density reconstruction to be numerically unstable. This problem is all the more acute when the data is discontinuous and heterogeneous in nature. Indeed, with the help of ingenious imaging techniques such as compressive sensing (?), the requirement of a large number of image data points is potentially mitigated. If implemented, compressive imaging would imply averaging the image data over a chosen few pixels, with respect to a known nonadaptive kernel. Then the inversion of such compressed data would require the computation of one more averaging or projection (and therefore one more integration) than are otherwise relevant. While this is easily done, the implementation of compressive sensing of electron microscopy data can be made possible only after detailed instrumentational additions to the imaging paraphernalia is made. Such instrumentational additions are outside the scope of our work. We thus invoke a novel design of imaging experiment that involves imaging at multiple values of some relevant parameter that is easily varied in a continuous way over a chosen range, unlike the viewing angle. In this paper, we present such an imaging strategy that allows for multiple images to be recorded, at each value of this parameter, when a cuboidal slab of a material sampe is being imaged with an SEM.
In imaging with an SEM, the projection of the density function is averaged over a 3D region inside the material, the volume of which we consider known; such a volume is indicated in the schematic diagram of this imaging technique shown in Figure 1. Within this region, a prefixed fraction of the atomistic interactions between the material and the incident electron beam stay confined, (?; ?). The 2D SEM images are taken in one of the different types of radiation that are generated as a result of the atomistic interactions between the molecules in the material and a beam of electrons that is made incident on the sample, as part of the experimental setup that characterises imaging with SEM, (?; ?; ?). Images taken with bulk electron microscopy techniques potentially carry information about the structure of the material sample under its surface, as distinguished from images obtained by radiation that is reflected off the surface of the sample, or is coming from near the surface^{1}^{1}1Since the radiation generated in an elemental threedimensional volume inside the bulk of the material sample, is due to the interaction of the electron beam and the material density in that volume, it is assumed that the material density is proportional to the radiation density generated in an elemental 3D volume. The radiation generated in an elemental volume inside the bulk of the material, is projected along the direction of the viewing angle chosen by the practitioner to produce the observed 2D image..
In practice, the situation is rendered more complicated by the fact that for the 2D images to be formed, it is not just the material density function, but the convolution of the material density function with a kernel that is projected onto the image space. The nature of the modulation introduced by this kernel is then also of interest, in order to disentangle its effect from that of the unknown density function that is responsible for generating the measured image. As for the modulation, both enhancement and depletion of the generated radiation is considered to occur in general. It is to be noted that this kernel is not the measure of blurring of a point object in the imaging technique, i.e. it is not the point spread function (PSF) of the observed image, but is characteristic of the material at hand. The kernel really represents the unknown microscopy correction function that convolves with the material density function–this convolution being projected to form the image data, which gets blurred owing to specifics of the imaging apparataus. Learning the deblurred image from the PSFconvolved image data is an independent pursuit, referred to as Blind Deconvolution (?; ?; ?). Here our aim is to learn the material density and the microscopy correction function using such noisy image data.
The convolution of the unknown kernel with the unknown density, is projected first onto the 2D surface of the system, and the resulting projection is then successively marginalised over the 2 spatial coordinates that span the plane of the system surface. Thus, three successive projections result in the image data. Here we deal with the case of an image resulting from a composition of a sequence of three independent projections of the convolution of the material density function and the unknown kernel or microscopy function. So only a sequence of inversions of the image allow us to learn this density. Such multiple inversions are seldom addressed in the literature, and here, we develop a Bayesian method that allows for such successive inversions of the observed noisy, PSFconvolved 2D images, in order to learn the kernel as well as the density function. The simultaneous pursuit of both the unknown density and the kernel, is less often addressed than are reported attempts at density reconstruction, under an assumed model for the kernel, (?; ?; ?; ?).
We focus here on the learning of a density function that is not necessarily convex, is either sparse or dense, is often multimodal  with the modes characterised by individualised, discontinuous substructure and abrupt bounds, resulting in the material density function being marked by isolated islands of overdensity and sharp density contrasts; we shall see this below in examples of reconstructed density functions presented in Section 9 and Section 10. Neither the isolated modes of the material density function in reallife material samples nor the adjacent bands of material overdensity, can be satisfactorily modelled with a mixture model. Again, modelling of such a reallife trivariate density function with a highdimensional Gaussian Process (GP) is not possible in lieu of training data, given that the covariance function of the invoked GP will then have to be nonstationary (?) and exhibit nonsmooth spatial variation. Here by “training data” is implied data comprising a set of known values of the material density, at chosen points inside the sample; such known values of the density function are unknown. Even if the covariance function were varying smoothly over the 3D spatial domain its parametrisation is in lieu of training data but with an abruptly evolving covariance function–as in the problem of inverting SEM image data–such parametrisation becomes impossible (?), especially when there is no training data available. At the same time, the blind implementation of the inverse Radon Transform would render the solution unstable given that the distribution of the image data in reallife systems is typically, highly discontinuous.
In the following section, (Section 2) we describe the experimental setup delineating the problem of inversion of the 2D images taken with Electron Microscopy imaging techniques, with the aim of learning the subsurface material density and the kernel. In addition, we present the novel design of imaging experiment that achieves multiple images. In Section 3 we discuss the outstanding difficulties of multiple and successive inversions and qualitatively explain the relevance of the advanced solution to this problem. This includes a discussion of the integral representation of the multiple projections (Section 3.2), the outline of the Bayesian inferential scheme adopted here (Section 3.6) and a section on the treatment of the lowrank component of the unknown material density (section 3.5). The details of the microscopy image data that affect the modelling at hand are presented in the subsequent section. This includes a description of the 3 models developed to deal with the three classes of image resolution typically encountered in electron microscopy (Section 3.3) and measurement uncertainties of the collected image data (Section 3.4). Two models on the kernel, as motivated by small and high levels of information available on the kernel given the material at hand are discussed in Section 4; priors on the parameters of the respective models is discussed here as well. Section 5 is devoted to the development of priors on the unknown density function such that the priors are adaptive to the sparsity in the density. In Section 6, the discretised form of the successive projections of the convolution of the unknown functions is given for the three different models. Inference is discussed in Section 7. Section 8 discusses with salient aspects of the uniqueness of the solutions. Application of the method to the analysis of real Scanning Electron Microscope image data is included in Section 10 while results from the inversion of simulated image data are presented in Section 9. Relevant aspects of the methodology and possible other reallife applications in the physical sciences are discussed in Section 11.
2 Application to Electron Microscopy Image Data
For our application, the system i.e. the slab of material, is modelled as a rectangular cuboid such that the surface at which the electron beam is incident is the =0 plane. Thus, the surface of the system is spanned by the orthogonal and axes, each of which is also orthogonal to the axis that spans the depth of the slab. In our problem, the unknown functions are the material density and the kernel .
Here we learn the unknown functions by inverting the image data obtained with a Scanning Electron Microscope. Though the learning of the convolution of and , is in principle illposed, we suggest a novel design of imaging experiment that allows for an expansion of the available information leading to being learnt uniquely when noise in the data is small (Section 8). This is achieved by imaging each part of the material sample at different values of a parameter such that images taken at different values of this parameter , carry information about the density function from different depths under the sample surface. Such is possible if represents the energy of the electrons in the incident electron beam that is used to take the SEM image of the system; since the subsurface penetration depth of the electron beam increases with electron energy, images taken with beams of different energies inherently bear information about the structure of the material at different depths.
The imaging instrument does not image the whole sample all at the same time. Rather, the imaging technique that is relevant to this problem is characterised by imaging different parts of the sample, successively in time. At each of these discrete time points, the th part, () of the sample is viewed by the imaging instrument, along the viewing vector to record the image datum in the th pixel on the imaging screen. Thus, every pixel of image data represents information about a part of the sample. The th viewing vector corresponds to the th point of incidence of the electron beam on the surface of the material sample. The image datum recorded in the th pixel then results from this viewing, and harbours information about the sample structure inside the th interactionvolume, which is the region that bounds atomistic interactions between the incident electron beam and molecules of the material (see Figure 1). The point of incidence of the th viewing vector, i.e. the centre of the th interaction volume is marked in the diagram. The volume of any such 3D interactionvolume is known from microscopy theory (?) and is a function of the energy of the beam electrons. In fact, motivated by the microscopy literature, (?), we model the shape of the th interactionvolume as hemispherical, centred at the incidence point of , with the radius of this hemisphere modelled as . We image each part of the sample at different values of , such that the th value of is , . To summarise, the data comprise a 2D images, each image being a square spatial array of number of pixels such that at the th pixel, for the th value of , the image data is where . Consideration of this full set of images will then suggest the subsurface density function of the sample, in a fully discretised model. Convolution of and is projected onto the system surface, i.e. the =0 plane and this projection is then further projected onto one of the or axes, with the resulting projection being projected once again, to the central point of the interactionvolume created by the th incidence of the beam of energy , to give rise to the image datum in the th pixel in the th image.
Expanding information by imaging at multiple values of is preferred to imaging at multiple viewing angles for logistical reason as discussed above in Section 1. Further, the shape of the interactionvolume is rendered asymmetric about the line of incidence of the beam when the beam is tilted to the vertical (?, Figure 3.20 on page 92 of), where the asymmetry is dependent on the tilt angle and the material. Then it is no longer possible to have confidence in any parametric model of the geometry of the interactionvolume as given in microscopy theory. Knowledge of the symmetry of the interactionvolume is of crucial importance to any attempt in inverting SEM image data with the aim of learning the material density function. So in this application, varying the tilt angle will have to be accompanied by approximations in the modelling of the projection of . We wish to avoid this and therefore resort to imaging at varying values of .
Physically speaking, the unknown density would be the material density representing amount or mass of material per unit 3D volume. The measured image datum could be the radiation in Back Scattered Electrons or Xrays, in each 2D pixel, as measured by a Scanning Electron Microscope.
3 Modelling
The general inverse problem is where the data , while the unknown function , with . In particular, 3D modelling involves the case of , . The case of is fundamentally an illposed problem (?; ?). Here is the measurement noise, the distribution of which, we assume known.
In our application, the data is represented as the projection of the convolution of the unknown functions as:
where the projection operator is a contractive projection from the space that lives in  which is  onto the image space . itself is a composition of 3 independent projections in general,
where is the projection onto the =0 plane, followed by  the projection onto the =0 axis, followed by  projection onto the centre of a known materialdependent three dimensional region inside the system, namely the interaction volume. These 3 projections are commutable, resulting in an effective contractive projection of onto the centre of this interaction volume. Thus, the learning of requires multiple (three) inversions of the image data. In this sense, this is a harder than usual inversion problem. The interaction volume and its centre  at which point the electron beam is incident  are shown in Figure 1.
It is possible to reduce the learning of to a leastsquares problem in the lownoise limit, rendering the inverse learning of unique by invoking the Moore Penrose inverse of the matrix that is the matrix representation of the operator (Section 8). The learning of and individually, from the uniquely learnt is still an illposed problem. In a nonBayesian framework, a comparison of the number of unknowns with the number of measured parameters is relevant; imaging at number of values of suggests that parameters are known while at most are unknown. Thus, for the typical values of and used in applications, ratio of known to unknown parameters is , using the aforementioned design of experiment (see Section 8.1). The unknown parameters still renders the individual learning of and nonunique. Such considerations are however relevant only in the absence of Bayesian spatial regularisation (?; ?), and as we will soon see, the lack of smooth variation in the spatial correlation structure underlying typically discontinuous reallife material density functions, render the undertaking of such regularisation risky.
However in the Bayesian framework we seek the posterior probability of the unknowns given the image data. Thus, in this approach there is no need to directly invert the sequential projection operator ; the variance of the posterior probability density of the unknowns given the data is crucially controlled by the strength of the priors on the unknowns (?; ?). We develop the priors on the unknowns using as much of information as is possible. Priors on the kernel can be weak or strong depending on information available in the literature relevant to the imaged system. Thus, in lieu of strong priors that can be elicited from the literature, a distributionfree model for is motivated, characterised by weak priors on the shape of this unknown function. On the contrary, if the shape is better known in the literature for the material at hand, the case is illustrated by considering a parametric model for , in which priors are imposed on the parameters governing this chosen shape. Also, the material density function can be sparse or dense. To help with this, we need to develop a prior structure on that adapts to the sparsity of the density in its native space. It is in this context, that such a problem of multiple inversions is well addressed in the Bayesian framework.
3.1 Defining a general voxel
Lastly, we realise that the data at hand is intrinsically discrete, owing to the imaging mechanism. Then,

for a given beam energy, collected image data is characterised by a resolution that depends on the size of the electron beam that is used to take the image, as well as the details of the particular instrument that is employed for the measurements. The SEM cannot “see” variation in structure at lengths smaller than the beam size. In practice, resolution is typically less than the beam size; is given by the microscopist. Then only one image datum is available over an interval of width along each of the and axes, i.e. only one datum is available from a square of edge , located at the beam pointing. This implies that using such data, no variation in the material density can be estimated within any square (lying on the plane) of side , located at a beam pointing, i.e. , where , . The reason for not being able to predict the density over lengths smaller than , in typical reallife material samples, is discussed below.

image data are recorded at discrete (chosen) values of the beam energy such that for a given beam pointing, a beam at a given value of carries information about the material density from within the subsurface depth of . Then 2 beams at successive values and of allow for estimation of the material density within the depth interval that is bounded by the beam penetration depths. Here , =0. This implies that for a given beam pointing, over any such interval, no variation in the material density can be learnt from the data, i.e. where , . Again, the reason for inability to predict the density at any is explained below.
As is evident in these 2 bulletted points above, we are attempting to learn the density at discrete points identified by the resolution in the data. The inability to predict the density at any point inside the sample could be possible in alternative models, for example when a Gaussian Process (GP) prior is used to model the trivariate density function that is typically marked by disjoint support in and/or by sharp density contrasts (seen in a real density as in Figure 10 and those learnt from simulated image data, as in Figure 5). While the covariance structure of such a GP would be nonstationary (as motivated in the introductory section), the highly discontinuous nature of reallife density functions would compel this nonstationary covariance function to not vary smoothly. Albeit difficult, the parametrisation of such covariance kernels can in principle be modelled using a training data–except for the unavailability of such training data (comprising known values of density at chosen points inside the given 3D sample). As training data is not at hand, parametrisation of the nonstationary covariance structure is not possible, leading us to resort to a fully discretised model in which we learn the density function at points inside the sample, as determined by the data resolution. In fact, the learnt density function could then be used as training data in a GPbased modelling scheme to help learn the covariance structure. Such an attempt is retained for future consideration.
Thus, prediction would involve (Bayesian) spatial regularisation which demands modelling of the abruptly varying spatial correlation structure that underlies the highly discontinuous, trivariate density function of real life material samples (see Figure 10). The spatial variation in the correlation structure of the image data cannot be used as a proxy for that of the material density function, since the former is not a pointer to the latter, given that compression of the density information within a 3D interactionvolume generates an image datum. Only with training data comprising known material density at chosen points, can we model the correlation structure of the density function. In lieu of training data, as in the current case, modelling of such correlation is likely to fail.
Hence we learn the density function at the grid points of a 3D Cartesian grid set up in the space of , and . No variation in the material density inside the th gridcell can be learnt, where this gridcell is defined as the “th voxel” which is the cuboid given by

square crosssectional area (on a constant plane) of size , with edges parallel to the and axes, located at the th beam incidence,

depth along the axis ranging from ,
with , , . Then one pair of opposite vertices of the th voxel are respectively at points and , where the th beam incidence is at the point and
(3.1) 
See Figure 2 for a schematic depiction of a general voxel. Here is the number of intervals of width along the axis, as well as along the axis. Thus, on the part of the plane that is imaged, there are squares of size . Formally, the density function that we can learn using such discrete image data is represented as
(3.2) 
This representation suggests that a discretised version of the sought
density function is what we can learn using the available image
data. We do not invoke any distribution for the unknown parameters
. It
is to be noted that the number of parameters that we seek is deterministically
driven by the image data that we work with–the discretisation of the
data drives the discretisation of the space of , and and
the density in each of the resulting voxels is sought. Thus, when the
image has a resolution and there are
such image data sets recorded at a value of ,
the number of sought density parameters is
where is the number of squares of size that
comprise part of the surface of the material slab that is imaged. Here
the number of parameters is typically large;
in the application to real data that we present later, this number is
2250. We set up a generic model that allows for the learning of such
large, datadriven number of distributionfree density parameters in a
Bayesian framework.
3.2 Defining a general interactionvolume
Under the consideration of a hemispherical shape of the interactionvolume, the th interactionvolume is completely specified by pinning its centre to the th beam pointing at on the =0 plane and fixing its radius to , (?) where the constant of proportionality is known from microscopy theory^{2}^{2}2 In the context of the application to microscopy, atomic theory studies undertaken by ? suggest that the maximal depth to which an electron beam can penetrate inside a given material sample is (measured in m), for a beam of electrons of energy (measured in kV), inside material of mass density of (measured in gm cm), atomic number and atomic weight (per gm per mole), is (3.3) Here is an integer valued atomic number of the material, while and are positivedefinite real valued constants. As increases, the depth and radial extent of the interactionvolume increases.and and are defined in Equation 3.1. In this hemispherical geometry, the maximal depth that the th interaction volume goes down to is .
The measured image datum in the th pixel, created at , is . This results when the convolution , of the unknown density and kernel in the voxels that lie inside the th interactionvolume, is sequentially projected (as mentioned in Section 3), onto the centre of the th interactionvolume, i.e. onto the point . This projection is referred to as . The integral representation of such a projection suggests 3 integrals, each over the three spatial coordinates that define the th interactionvolume, according to
(3.4) 
for . Here the vector represents value of displacement from the point of incidence , to a general point , inside the interactionvolume, i.e.
3.3 Cases classed by image resolution
We realise that the thickness of the electron beam imposes an upper limit on the image resolution, i.e. on the smallest length over which structure can be learnt in the available image data. For example, the resolution is finer when the SEM image is taken in Back Scattered Electrons ( 0.01m) than in Xrays ( 1m). The comparison between the crosssectional areas of a voxel and of an interactionvolume, on the =constant plane is determined by

since the area of the voxel at any is .

the atomicnumber parameter of the material at hand (see Equation 3.3) since the radius of the hemispherical interactionvolume is a monotonically decreasing function of , at .
Then we can think of 3 different resolutiondriven models such that

voxel crosssectional area on the =0 plane exceeds that of interactionvolumes attained at all , i.e.
. 
voxel crosssectional area on the =0 plane exceeds that of interactionvolumes at some values of but is lower than that of interactionvolumes attained at higher values, i.e. and .

voxel crosssectional area on the =0 plane is lower than that of interactionvolumes attained at all , i.e. .
The first two models are realised for coarse resolution of the imaging technique, i.e. for large . This is contrasted with the 3rd model, which pertains to fine resolution, i.e. low . Since is a monotonically decreasing function of (Equation 3.3), the 1st model is feasible for “high materials”, while the 2nd model is of relevance when the material at hand is a “low material”. To make this more formal, we introduce two classes of materials as follows. For fixed values of all parameters but ,
(3.6)  
The computation of the sequential projections involve multiple integrals (see Equation 3.4). The required number of such integrals can be reduced by invoking symmetries in the material density in any interactionvolume. Such symmetries become avaliable, as the resolution in the imaging technique becomes coarser. For example, for the 1st model, when the resolution is the coarsest, the minimum length learnt in the data exceeds the diameter of the lateral crosssection of the largest interactionvolume achieved at the highest value of , i.e for . (Here “lateral cross section” implies the crosssection on the =0 plane). Then for the coarsest resolution, (see Figure 3) and so, which means the lateral crosssection of the th interactionvolume is contained inside that of the th voxel. Now in our discretised representation, the material density inside any voxel is a constant. Therefore it follows that when resolution is the coarsest, material density at any depth inside an interactionvolume, is a constant independent of the angular coordinate (see Equation 3.4). Thus, the projection of onto the centre of the interactionvolume, does not involve integration over this angular coordinate.
However, such will not be possible for the 3rd model at finer imaging resolution. When the resolution is the finest possible with an SEM, is the smallest. In this case, the lateral crosssection of many voxels fill up the crosssectional area of a given interaction volume. Thus, the density is varying within the interactionvolume. Thus, the projection of cannot forgo integrating over .
The last case that we consider falls in between these extremes of resolution; this is the 2nd model. Isotropy is valid but not for ; (see middle panel in Figure 3).
3.4 Noise in discrete image data
As discussed above in Section 1, noise in the data can crucially influence how wellconditioned the inverse problem is. Image data is invariably subjected to noise, though for images taken with Scanning Electron Microscopes (SEM), the noise is small. Also, imaging with SEM being a controlled experiment, this noise can be reduced by the microscopist by increasing the length of time taken to scan the sample with the electron beam that is made incident on the sample in this imaging procedure. As illustrations of the noise in real SEM data, ? suggest that in a typical 20 second scan of the SEM, the signal to noise is about 20:1, which can be improved to 120:1 for a longer acquisition time of 10 minutes^{3}^{3}3In modern SEMs, noise reduction is alternatively handled using pixel averaging (”Stereoscan 430 Operator Manual”, Leica Cambridge Ltd. Cambridge, England, 1994).. Importantly, this noise could arise due to to variations in the beam configurations, detector irregularities, pressure fluctuations, etc. In summary, such noise is considered to be due to random variations in parameters that are intrinsically continuous, motivating a Gaussian noise distribution. Thus, the noise in the 2D image data, created by the th beam pointing for , (i.e. the noise in ) is modelled as a random normal variable, drawn from the normal with standard deviation 0.05, . In the illustration with real data, a scan speed of 50 s was used, which implies a noise of less than 5.
3.5 Identifying lowrank and spatiallyvarying components of density
It is known in the literature that the general, underdetermined deconvolution problem is solvable only if the unknown density is intrinsically, “sufficiently” sparse (?; ?). Here we advance a methodology, within the frame of a designed experiment, to learn the density  sparse or dense. In the following subsection, we will see that priors on the sparsity that we develop here, bring in relatively more information into the models when the density structure is sparse.
With this in mind, the density is recognised to be made of a constant (the lowrank component in the limiting sense), and the spatially varying component that may be sparse or dense in . We view the constant part of the density as where is the Dirac delta function on (?), centred at the centre of the th interaction volume, . Then in our problem, the contribution of the constant part of the density to the projection onto the centre of the th interactionvolume is
(3.7)  
a constant independent of the beam pointing location , if is restricted to be a function of the depth coordinate only. As is discussed in Section 4, this is indeed what we adopt in the model for the kernel.
Then, depends only on the known morphological details of the interactionvolume for a given value of , . Thus, , where is the spatiallyvarying component of the image data. The identification of the constant component of the density is easily performed as due to the constant component of the measurable.
In our inversion exercise, it is the field that is actually implemented as data, after is subtracted from , for each . Hereafter, when we refer to the data, the spatiallyvarying part of the data will be implied; it is this part of the data that will hereafter be referred to as , at each value of , . Its inversion will yield a spatially varying sparse/dense density, that we will from now, be referred to as that in general lies in a nonconvex subset of . Thus, we see that in this model, it is possible for to be 0. The construction of the full density, inclusive of the lowrank and spatiallyvarying parts, is straightforward once the latter is learnt.
3.6 Basics of algorithm leading to learning of unknowns
The basic scheme of the learning of the unknown functions is as follows. First, we perform multiple projections of the convolution of the unknown functions in the forward problem, onto the incidence point of the th interactionvolume, . The likelihood is defined as a function of the Euclidean distance between the spatiallyaveraged projection , and the image datum . We choose to work with a Gaussian likelihood (Equation 7.1). Since the imaging at any value of along any of the viewing vectors is done independent of all other values of and all other viewing vectors, we assume to be conditionally on the values of the model parameters.
We develop adaptive priors on sparsity on the density and present strong to weak priors on the kernel, to cover the range of high to low information about the kernel that may be available. In concert with these, the likelihood leads to the posterior probability of the unknowns, given all the image data. The posterior is sampled from using adaptive MetropolisHastings within Gibbs. The unknown functions are updated using respective proposal densities. Upon convergence, we present the learnt parameters with 95highest probability density credible regions. At the same time, discussions about the identifiability of the solutions for the 2 unknown functions and of uniqueness of the solutions are considered in detail.
4 Kernel function
Identifiability between the solutions for the unknown material density and kernel is aided by the availability of measurement of the kernel at the surface of the material sample and the model assumption that the kernel function depends only on the depth coordinate and is independent of and . Thus, the correction function is . In the discretised space of , and , the kernel is then defined as
(4.1) 
Then is the value of the kernel function on the material surface, i.e. at and this is available from microscopy theory (?).
On occasions when the shape of the kernel function is known for the material at hand, only the parameters of this shape need to be learnt from the data. Given the measured value , i.e. the measured , the total number of kernel parameters is then 2. In this case, the total number of parameters that we attempt learning from the data is , i.e. the 2 kernel shape parameters and number of material density parameters. We refer to such a model for the kernel as parametric. For other materials, the shape of the kernel function may be unknown. In that case, a distributionfree model of the parameters of the vectorised kernel function are learnt. In this case, the total number of parameters that we learn is .
We fall back on elicitation from the literature on electron microscopy to obtain the priors on the kernel parameters. In microscopy practice, the measured image data collected along an angle to the vertical is given by , where and with a proportionality constant that is not known apriori but needs to be estimated using systemspecific Monte Carlo simulations or approximations based on curvefitting techniques; see ?. is the distribution of the variable and is again not known apriori but the suggestion in the microscopy literature has been that this can be estimated using Monte Carlo simulations of the system or from atomistic models. However, these simulations or modelbased calculations are materialspecific and their viability in inhomogeneous material samples is questionable. Given the lack of modularity in the modelling of the relevant system parameters within the conventional approach, and the existence in the microscopy literature of multiple models that are distinguished by the underlying approximations (?; ?), it is meaningful to seek to learn the correction function.
Our construct differs from this formulation in that we construct an infinitesimally small volume element of depth inside the material, at the point . In the limit 0, the density of the material inside this infinitesimally small volume is a constant, namely . Thus, the image datum generated from within this infinitesimally small volume  via convolution of the material density and the kernel  is . Thus, over this infinitesimal volume, our formulation will tie in with the representation in microscopic theory, if we set . It then follows that is motivated to have the same shape as , where is a known, material dependent constant^{4}^{4}4 depends upon whether the considered image is in Back Scattered Electrons (BSE) or Xrays. In he former case represents the BSE coefficient, often estimated with MonteCarlo simulations and in the latter, it is the linear attenuation coefficient).. When information about this shape is invoked, the model for is referred to as parametric; however, keeping general application contexts in mind, a less informative prior for the unknown kernel is also developed.
4.1 Parametric model for correction function
Information is sometimes available on the shape of the function that represents the kernel. For example, we could follow the aforementioned suggestion of approximating the form of as , multiplied by the scalefactor , as long as is known for the material and imaging technique at hand. In fact, for the values of beam energy that we work at, for most materials , so that 1 ( http://physics.nist.gov/PhysRefData/XrayMassCoef/tab3.html, Goldstein et. al 2003). Thus, we approximate the shape of to resemble that of , as given in microscopy literature. This shape is akin to that of the folded normal distribution (?), so that we invoke the folded normal density function to write
(4.2) 
Thus, in this model, is deterministically known if the parameters , and are, where, and are the mean and dispersion that define the above folded normal distribution. Out of these three parameters, only two are independent since is known on the surface, i.e. is known (see Section 4). Thus, by setting , we relate deterministically to and . In this model of the correction function, we put folded normal priors on and . Thus, this model of the correction function is parametric in the sense that is parametrised, and priors are put on its unknown parameters.
4.2 Distributionfree model for correction function
In this model for the correction function, we choose folded normal priors for , i.e. with . The folded normal priors on (?) underlie the fact that and that there is a nonzero probability for to be zero. Thus, folded normal and truncated normal priors for would be relevant but gamma priors would not be. Also, microscopy theory allows for the kernel at the surface to be known deterministically, i.e. is known (Section 4). Then setting 1, we relate and . This relation is used to compute , while uniform priors are assigned to the hyperparameters , . We refer to this as the “distributionfree model of ”.
We illustrate the effects of both modelling strategies in simulation studies that are presented in Section 9.
The correction function is normalised by , where is the estimated unscaled correction function in the first bin, i.e. for and is the measured value of the correction function at the surface of the material, given in microscopy literature. It is to be noted that the available knowledge of allows for the identifiability of the amplitudes of and .
5 Priors on sparsity of the unknown density
In this section we present the adaptive priors on the sparsity in the density, developed by examining the geometrical aspects of the problem.
As we are trying to develop priors on the sparsity, we begin by identifying the voxels in which density is zero, i.e. . This identification will be made possible by invoking the nature of the projection operator . For example, we realise that it is possible for the measured image datum collected from the th interactionvolume to be nonzero, even when density in the th voxel is zero, owing to contributions to from neighbouring voxels that are included within the th interactionvolume. Such understanding is used to identify the voxels in which density is zero. For a voxel in which density is nonzero, we subsequently learn the value of the density. The constraints that lead to the identification of voxels with null density can then be introduced into the model via the prior structure in the following ways.

We could check if = , (where projection onto the centre of the th interactionvolume without including density from the th voxel, i.e. without including ). If so, then . However, this check would demand the computation of over a given interactionvolume, as many times as there are voxels that lie fully or partially inside this interactionvolume. Such computation is avoided owing to its being computationally intensive. Instead, we opt for a probabilistic suggestion for when density in a voxel is zero. Above, , .

We expect that the projection of onto the centre of the interactionvolume achieved at will in general exceed the projection onto the same central point of a smaller interaction volume (at ). However, if the density of the th voxel is zero or very low, the contributions from the interactionvolume generated at may not exceed that from the same generated at . Thus, it might be true that
(5.1) . This statement is true with some probability. An alternate representation of the statement 5.1 is achieved as follows. We define the random variable , with , such that
(5.2) Then the statement 5.1 is the same as the statement: “it might be true that or is very low”, with some probability . In fact, as the projection from the bigger interactionvolume is in general in excess of that from a smaller interactionvolume, we understand that closer is to , higher is the probability that is close to 0. In other words, the smaller is , higher is the probability that is close to 0, where
(5.3) with the hyperparameter controlling the nonlinearity of response of the function to increase in . The advantage of the chosen form of is that it is monotonic and its response to increasing value of its argument is controlled by a single parameter, namely . We assign a a hyper prior that is uniform over the experimentallydetermined range of [0.6, 0.99], to ensure that is flatter for lower and steeper for higher , (as moves across the range ). The prior on the density parameter is then defined as
(5.4) Thus, [0,1] . Any normalisation constant on the prior can be subsumed into the normalisation of the posterior of the unknowns given the data; had we used a normalisation of , for all , at those for which or , the prior on would have reduced to being a normal . This is because, for such , so that . However, for the prior on is nonnormal as the dispersion is itself dependent on . This prior then adapts to the sparsity of the material density distribution. We do not aim to estimate the degree of sparsity in the material density in our work but aim to develop a prior so that sampled from this prior represents the material density distribution at the given , however sparse the vector is. Evidently, we do not use a mixture prior but opt for no mixing as in ?. Indeed, in our prior structure, the term could have been replaced by , (as in parametric Laplace priors suggested by ?, ?, ?), i.e. by since is nonnegative, but as far as sparsity tracking is concerned–which is our focus in developing this prior here–the prior in Equaion 5.4 suffices.
That such a prior probability density sensitively adapts to the sparsity in the material density distribution, is brought about in the results of 2 simulation studies shown in Figure 4. In these studies, the density parameter values in the th voxel are simulated from 2 simplistic toy models that differ from each other in the degree of sparsity of the true material density distribution: , and respectively, (where are uniformly distributed random numbers in ), at a chosen and energy indices . In the simulations we specify the beam penetration depth as suggested by ?; as any interactionvolume is hemispherical, its radius . The kernel parameters are generated from a quadratic function of with noise added. In the simulations, the material is imaged at resolution such that , i.e. the “1st model” is relevant (see Section 3.3). This allows for simplification of the computation of according to Equation 3.4. Then at this , for , are plotted in Figure 4 against , as is the logarithm of the prior computed according to Equation 5.4, with held as a random number, uniform in [0.6,0.99]. Logarithm of the priors are also plotted as a function of the material density parameter. We see from the figure that the prior developed here tracks the sparsity of the vector well.
6 Three models for image inversion
In this section we develop the projection of the convolution of the unknown density and kernel onto the centre of the th interaction volume, for each of the 3 models discussed in Section 3.3, motivated by the 3 different classes of image data classified by their resolution. The difference between the size of a voxel and that of an interactionvolume determines the difficulty in the inversion of to the image data.
As explained in Section 3.3, we attempt to identify means of dimensionality reduction, i.e. reducing the number of integrals involved in the sequential projection of , (see Equation 3.4). We do this by identifying isotropy in the distribution of the material density within the interactionvolume when possible, leading to elimination of the requirement of averaging over the angular coordinate.
6.1 1st and 2ndmodels  Low resolution
This class of image resolutions (), pertain to the case when the system is imaged by an SEM in Xrays.
6.1.1 High Systems
In the 1st model, for the high materials, the crosssectional area of an interactionvolume attained at any will fit wholly inside that of a voxel, where the crosssection is on a constant plane. Then at a given depth, the density inside this interactionvolume is that inside the voxel, i.e. is a constant. Thus, dimensionality reduction is most easily achieved in this case with the density function isotropic inside an interactionvolume, bearing no dependence on the angular coordinate of Equation 3.4. Then when we revisit Equation 3.4 in its discretised form, the discrete convolution within the th bin and the th beam pointing gives , so that the projection onto the centre of the th interactionvolume is discretised to give in Equation 3.4, gives the following.
(6.1) 
6.1.2 Low Systems
In the 2nd model, for low materials, for any , and , the crosssectional area of the th interactionvolume on the =0 plane will spill out of the th voxel into the neighbouring voxels. Then only for at , isotropy within the th interaction volume holds good. In general, the projection onto the centre of the th interactionvolume includes contributions from all those voxels that lie wholly as well as partly inside this interaction volume. This projection is then computed by first learning the weighted average of the contributions from all the relevant voxels and then distributing this learnt weighted average over the identified voxels in the proportion of the weights which in turn are in proportion of the voxel volume that lies within the th interactionvolume. Thus, for the th voxel that is a neighbour of the th voxel that lies wholly inside the th interactionvolume, let this proportion be , where .
In general, at a given , any bulk voxel has 8
neighbouring voxels and when the voxel lies at the corner or edge of
the sample, number of nearest
neighbours is less than 8. Then, at a given , there will be contribution from at most 9 voxels
towards . At , for any , let the maximum number of nearest neighbours be so that . The notation for this number bears its dependence on both and .
We define
as the weighted average of the densities in the
th voxel and its nearest neighbours that are fully or partially
included within the th interactionvolume. Here ,
, . Thus,
(6.2) 
where the th neighbour of the th voxel at the same depth, harbours the density and there is a maximum of such neighbours. The effect of this averaging over the nearest neighbours at this depth, is equivalent to averaging over the angular coordinate and results in the angular averaged density at this , which by definition, is isotropic, i.e. independent of the angular coordinate. Then the projection onto the centre of the th interactionvolume, for , is computed as in Equation 6.1 with the material density term on the RHS of this equation replaced by the isotropic angular averaged density . However, for , the projection is computed as in Equation 6.1.
6.2 3 model  Fine resolution
In certain imaging techniques, such as imaging in Back Scattered Electrons by an SEM or FESEM, the resolution , . In this case, at a given , material density from multiple voxels contribute to and the three integrals involved in the computation of this projection, as mentioned in Equation 3.4, cannot be avoided. Knowing the shape of the interactionvolume, it is possible to identify voxels that live partly or wholly inside the th interactionvolume as well as compute the fractional or full volume of each such voxel inside the th interaction volume.
For this model, the projection equation is written in terms of the
coordinates of a point instead of the polar coordinate
representation of this point, where the point in question lies inside
the th interactionvolume that is centred at . Then
inside the th interactionvolume, at a given and ,
. For