Bayesian Density Estimation via Multiple Sequential Inversions of 2-D Images with Application in Electron Microscopy

# Bayesian Density Estimation via Multiple Sequential Inversions of 2-D Images with Application in Electron Microscopy

Dalia Chakrabarty, University of Warwick, d.chakrabarty@warwick.ac.uk and
University of Leicester, dc252@le.ac.uk
Fabio Rigat, Novartis Vaccines and Diagnostics, fabio.rigat@novartis.com
Nare Gabrielyan, De Montfort University, nare.gabrielyan@email.dmu.ac.uk
Richard Beanland, University of Warwick, r.beanland@warwick.co.uk
Shashi Paul, De Montfort University, spaul@dmu.ac.uk

Authors’ affiliations:

Associate Research Fellow in Department of Statistics, University of Warwick, Coventry CV4 7AL, U.K., corresponding author,
Lecturer of Statistics, Department of Mathematics, University of Leicester, Leicester LE1 7RH, U.K.,
Research Biostatistics Group Head, Novartis Vaccines and Diagnostics and Associate fellow at Department of Statistics, University of Warwick,
Graduate student at Emerging Technologies Research Centre, De Montfort University,
Lecturer, Department of Physics, University of Warwick,
Head, Emerging Technologies Research Centre, De Montfort University

Abstract

We present a new Bayesian methodology to learn the unknown material density of a given sample by inverting its two-dimensional images that are taken with a Scanning Electron Microscope. An image results from a sequence of projections of the convolution of the density function with the unknown microscopy correction function that we also learn from the data. We invoke a novel design of experiment, involving imaging at multiple values of the parameter that controls the sub-surface depth from which information about the density structure is carried, to result in the image. Real-life material density functions are characterised by high density contrasts and typically are highly discontinuous, implying that they exhibit correlation structures that do not vary smoothly. In the absence of training data, modelling such correlation structures of real material density functions is not possible. So we discretise the material sample and treat values of the density function at chosen locations inside it as independent and distribution-free parameters. Resolution of the available image dictates the discretisation length of the model; three models pertaining to distinct resolution classes are developed. We develop priors on the material density, such that these priors adapt to the sparsity inherent in the density function. The likelihood is defined in terms of the distance between the convolution of the unknown functions and the image data. The posterior probability density of the unknowns given the data is expressed using the developed priors on the density and priors on the microscopy correction function as elicitated from the Microscopy literature. We achieve posterior samples using an adaptive Metropolis-within-Gibbs inference scheme. The method is applied to learn the material density of a 3-D sample of a nano-structure, using real image data. Illustrations on simulated image data of alloy samples are also included.

Keywords: Bayesian methods, Inverse Problems, Nonparametric methods, Priors on sparsity, Underdetermined problems

## 1 Introduction

Nondestructive learning of the full three-dimensional material density function in the bulk of an object, using available two dimensional images of the object, is an example of a standard inverse problem (?????). The image results from the projection of the three dimensional material density onto the image plane. However, inverting the image data does not lead to a unique density function in general; in fact to render the inversion unique, further measurements need to be invoked. For instance, the angle at which the object is imaged or viewed is varied to provide a series of images, thereby expanding available information. This allows to achieve ditinguishability (or identifiability) amongst the solutions for the material density. A real-life example of such a situation is presented by the pursuit of material density function using noisy 2-dimensional (2-D) images taken with electron microscopy techniques (?). Such non-invasive and non-destructive 3-D density modelling of material samples is often pursued to learn the structure of the material in its depth (?), with the ulterior aim of controlling the experimental conditions under which material samples of desired qualities are grown.

Formally, the projection of a density function onto a lower dimensional image space is referred to as the Radon Transform; see ?, ?. The inverse of this projection is also defined but requires the viewing angle as an input and secondly, involves taking the spatial derivative of the density function, rendering the computation of the inverse projection numerically unstable if the image data is not continuous or if the data comprises limited-angle images (??) or if noise contaminates the data (?). Furthermore, in absence of measurements of the viewing angle, the implementation of this inverse is not directly possible, as ? suggested. Even when the viewing angle is in principle measurable, logistical difficulties in imaging at multiple angles result in limited-angle images. For example, in laboratory settings, such as when imaging with Scanning Electron Microscopes (SEMs), the viewing angle is varied by re-mounting the sample on stubs that are differently inclined each time. This mounting and remounting of the sample is labour-intensive and such action leads to the breaking of the vacuum within which imaging needs to be undertaken, causing long waiting periods between two consecutive images. When vacuum is restored and the next image is ready to be taken, it is very difficult to identify the exact fractional area of the sample surface that was imaged the last time, in order to scan the beam over that very area. In fact, the microscopist would prefer to opt for an imaging technique that allows for imaging without needing to break the vacuum in the chamber at all. Such logistical details are restrictive in that this allows imaging at only a small number of viewing angles, which can cause the 3-D material density reconstruction to be numerically unstable. This problem is all the more acute when the data is discontinuous and heterogeneous in nature. Indeed, with the help of ingenious imaging techniques such as compressive sensing (?), the requirement of a large number of image data points is potentially mitigated. If implemented, compressive imaging would imply averaging the image data over a chosen few pixels, with respect to a known non-adaptive kernel. Then the inversion of such compressed data would require the computation of one more averaging or projection (and therefore one more integration) than are otherwise relevant. While this is easily done, the implementation of compressive sensing of electron microscopy data can be made possible only after detailed instrumentational additions to the imaging paraphernalia is made. Such instrumentational additions are outside the scope of our work. We thus invoke a novel design of imaging experiment that involves imaging at multiple values of some relevant parameter that is easily varied in a continuous way over a chosen range, unlike the viewing angle. In this paper, we present such an imaging strategy that allows for multiple images to be recorded, at each value of this parameter, when a cuboidal slab of a material sampe is being imaged with an SEM.

In imaging with an SEM, the projection of the density function is averaged over a 3-D region inside the material, the volume of which we consider known; such a volume is indicated in the schematic diagram of this imaging technique shown in Figure 1. Within this region, a pre-fixed fraction of the atomistic interactions between the material and the incident electron beam stay confined, (??). The 2-D SEM images are taken in one of the different types of radiation that are generated as a result of the atomistic interactions between the molecules in the material and a beam of electrons that is made incident on the sample, as part of the experimental setup that characterises imaging with SEM, (???). Images taken with bulk electron microscopy techniques potentially carry information about the structure of the material sample under its surface, as distinguished from images obtained by radiation that is reflected off the surface of the sample, or is coming from near the surface111Since the radiation generated in an elemental three-dimensional volume inside the bulk of the material sample, is due to the interaction of the electron beam and the material density in that volume, it is assumed that the material density is proportional to the radiation density generated in an elemental 3-D volume. The radiation generated in an elemental volume inside the bulk of the material, is projected along the direction of the viewing angle chosen by the practitioner to produce the observed 2-D image..

In practice, the situation is rendered more complicated by the fact that for the 2-D images to be formed, it is not just the material density function, but the convolution of the material density function with a kernel that is projected onto the image space. The nature of the modulation introduced by this kernel is then also of interest, in order to disentangle its effect from that of the unknown density function that is responsible for generating the measured image. As for the modulation, both enhancement and depletion of the generated radiation is considered to occur in general. It is to be noted that this kernel is not the measure of blurring of a point object in the imaging technique, i.e. it is not the point spread function (PSF) of the observed image, but is characteristic of the material at hand. The kernel really represents the unknown microscopy correction function that convolves with the material density function–this convolution being projected to form the image data, which gets blurred owing to specifics of the imaging apparataus. Learning the de-blurred image from the PSF-convolved image data is an independent pursuit, referred to as Blind Deconvolution (???). Here our aim is to learn the material density and the microscopy correction function using such noisy image data.

The convolution of the unknown kernel with the unknown density, is projected first onto the 2-D surface of the system, and the resulting projection is then successively marginalised over the 2 spatial coordinates that span the plane of the system surface. Thus, three successive projections result in the image data. Here we deal with the case of an image resulting from a composition of a sequence of three independent projections of the convolution of the material density function and the unknown kernel or microscopy function. So only a sequence of inversions of the image allow us to learn this density. Such multiple inversions are seldom addressed in the literature, and here, we develop a Bayesian method that allows for such successive inversions of the observed noisy, PSF-convolved 2-D images, in order to learn the kernel as well as the density function. The simultaneous pursuit of both the unknown density and the kernel, is less often addressed than are reported attempts at density reconstruction, under an assumed model for the kernel, (????).

We focus here on the learning of a density function that is not necessarily convex, is either sparse or dense, is often multimodal - with the modes characterised by individualised, discontinuous substructure and abrupt bounds, resulting in the material density function being marked by isolated islands of overdensity and sharp density contrasts; we shall see this below in examples of reconstructed density functions presented in Section 9 and Section 10. Neither the isolated modes of the material density function in real-life material samples nor the adjacent bands of material over-density, can be satisfactorily modelled with a mixture model. Again, modelling of such a real-life trivariate density function with a high-dimensional Gaussian Process (GP) is not possible in lieu of training data, given that the covariance function of the invoked GP will then have to be non-stationary (?) and exhibit non-smooth spatial variation. Here by “training data” is implied data comprising a set of known values of the material density, at chosen points inside the sample; such known values of the density function are unknown. Even if the covariance function were varying smoothly over the 3-D spatial domain its parametrisation is in lieu of training data but with an abruptly evolving covariance function–as in the problem of inverting SEM image data–such parametrisation becomes impossible (?), especially when there is no training data available. At the same time, the blind implementation of the inverse Radon Transform would render the solution unstable given that the distribution of the image data in real-life systems is typically, highly discontinuous.

In the following section, (Section 2) we describe the experimental setup delineating the problem of inversion of the 2-D images taken with Electron Microscopy imaging techniques, with the aim of learning the sub-surface material density and the kernel. In addition, we present the novel design of imaging experiment that achieves multiple images. In Section 3 we discuss the outstanding difficulties of multiple and successive inversions and qualitatively explain the relevance of the advanced solution to this problem. This includes a discussion of the integral representation of the multiple projections (Section 3.2), the outline of the Bayesian inferential scheme adopted here (Section 3.6) and a section on the treatment of the low-rank component of the unknown material density (section 3.5). The details of the microscopy image data that affect the modelling at hand are presented in the subsequent section. This includes a description of the 3 models developed to deal with the three classes of image resolution typically encountered in electron microscopy (Section 3.3) and measurement uncertainties of the collected image data (Section 3.4). Two models on the kernel, as motivated by small and high levels of information available on the kernel given the material at hand are discussed in Section 4; priors on the parameters of the respective models is discussed here as well. Section 5 is devoted to the development of priors on the unknown density function such that the priors are adaptive to the sparsity in the density. In Section 6, the discretised form of the successive projections of the convolution of the unknown functions is given for the three different models. Inference is discussed in Section 7. Section 8 discusses with salient aspects of the uniqueness of the solutions. Application of the method to the analysis of real Scanning Electron Microscope image data is included in Section 10 while results from the inversion of simulated image data are presented in Section 9. Relevant aspects of the methodology and possible other real-life applications in the physical sciences are discussed in Section 11.

## 2 Application to Electron Microscopy Image Data

For our application, the system i.e. the slab of material, is modelled as a rectangular cuboid such that the surface at which the electron beam is incident is the =0 plane. Thus, the surface of the system is spanned by the orthogonal and -axes, each of which is also orthogonal to the -axis that spans the depth of the slab. In our problem, the unknown functions are the material density and the kernel .

Here we learn the unknown functions by inverting the image data obtained with a Scanning Electron Microscope. Though the learning of the convolution of and , is in principle ill-posed, we suggest a novel design of imaging experiment that allows for an expansion of the available information leading to being learnt uniquely when noise in the data is small (Section 8). This is achieved by imaging each part of the material sample at different values of a parameter such that images taken at different values of this parameter , carry information about the density function from different depths under the sample surface. Such is possible if represents the energy of the electrons in the incident electron beam that is used to take the SEM image of the system; since the sub-surface penetration depth of the electron beam increases with electron energy, images taken with beams of different energies inherently bear information about the structure of the material at different depths.

The imaging instrument does not image the whole sample all at the same time. Rather, the imaging technique that is relevant to this problem is characterised by imaging different parts of the sample, successively in time. At each of these discrete time points, the -th part, () of the sample is viewed by the imaging instrument, along the viewing vector to record the image datum in the -th pixel on the imaging screen. Thus, every pixel of image data represents information about a part of the sample. The -th viewing vector corresponds to the -th point of incidence of the electron beam on the surface of the material sample. The image datum recorded in the -th pixel then results from this viewing, and harbours information about the sample structure inside the -th interaction-volume, which is the region that bounds atomistic interactions between the incident electron beam and molecules of the material (see Figure 1). The point of incidence of the -th viewing vector, i.e. the centre of the -th interaction volume is marked in the diagram. The volume of any such 3-D interaction-volume is known from microscopy theory (?) and is a function of the energy of the beam electrons. In fact, motivated by the microscopy literature, (?), we model the shape of the -th interaction-volume as hemispherical, centred at the incidence point of , with the radius of this hemisphere modelled as . We image each part of the sample at different values of , such that the -th value of is , . To summarise, the data comprise a 2-D images, each image being a square spatial array of number of pixels such that at the -th pixel, for the -th value of , the image data is where . Consideration of this full set of images will then suggest the sub-surface density function of the sample, in a fully discretised model. Convolution of and is projected onto the system surface, i.e. the =0 plane and this projection is then further projected onto one of the or axes, with the resulting projection being projected once again, to the central point of the interaction-volume created by the -th incidence of the beam of energy , to give rise to the image datum in the -th pixel in the -th image.

Expanding information by imaging at multiple values of is preferred to imaging at multiple viewing angles for logistical reason as discussed above in Section 1. Further, the shape of the interaction-volume is rendered asymmetric about the line of incidence of the beam when the beam is tilted to the vertical (?, Figure 3.20 on page 92 of), where the asymmetry is dependent on the tilt angle and the material. Then it is no longer possible to have confidence in any parametric model of the geometry of the interaction-volume as given in microscopy theory. Knowledge of the symmetry of the interaction-volume is of crucial importance to any attempt in inverting SEM image data with the aim of learning the material density function. So in this application, varying the tilt angle will have to be accompanied by approximations in the modelling of the projection of . We wish to avoid this and therefore resort to imaging at varying values of .

Physically speaking, the unknown density would be the material density representing amount or mass of material per unit 3-D volume. The measured image datum could be the radiation in Back Scattered Electrons or X-rays, in each 2-D pixel, as measured by a Scanning Electron Microscope.

## 3 Modelling

The general inverse problem is where the data , while the unknown function , with . In particular, 3-D modelling involves the case of , . The case of is fundamentally an ill-posed problem (??). Here is the measurement noise, the distribution of which, we assume known.

In our application, the data is represented as the projection of the convolution of the unknown functions as:

 ~I=C[ρ∗η]+ε,

where the projection operator is a contractive projection from the space that lives in - which is - onto the image space . itself is a composition of 3 independent projections in general,

 C=P1∘P2∘P3,

where is the projection onto the =0 plane, followed by - the projection onto the =0 axis, followed by - projection onto the centre of a known material-dependent three dimensional region inside the system, namely the interaction volume. These 3 projections are commutable, resulting in an effective contractive projection of onto the centre of this interaction volume. Thus, the learning of requires multiple (three) inversions of the image data. In this sense, this is a harder than usual inversion problem. The interaction volume and its centre - at which point the electron beam is incident - are shown in Figure 1.

It is possible to reduce the learning of to a least-squares problem in the low-noise limit, rendering the inverse learning of unique by invoking the Moore Penrose inverse of the matrix that is the matrix representation of the operator (Section 8). The learning of and individually, from the uniquely learnt is still an ill-posed problem. In a non-Bayesian framework, a comparison of the number of unknowns with the number of measured parameters is relevant; imaging at number of values of suggests that parameters are known while at most are unknown. Thus, for the typical values of and used in applications, ratio of known to unknown parameters is , using the aforementioned design of experiment (see Section 8.1). The unknown parameters still renders the individual learning of and non-unique. Such considerations are however relevant only in the absence of Bayesian spatial regularisation (??), and as we will soon see, the lack of smooth variation in the spatial correlation structure underlying typically discontinuous real-life material density functions, render the undertaking of such regularisation risky.

However in the Bayesian framework we seek the posterior probability of the unknowns given the image data. Thus, in this approach there is no need to directly invert the sequential projection operator ; the variance of the posterior probability density of the unknowns given the data is crucially controlled by the strength of the priors on the unknowns (??). We develop the priors on the unknowns using as much of information as is possible. Priors on the kernel can be weak or strong depending on information available in the literature relevant to the imaged system. Thus, in lieu of strong priors that can be elicited from the literature, a distribution-free model for is motivated, characterised by weak priors on the shape of this unknown function. On the contrary, if the shape is better known in the literature for the material at hand, the case is illustrated by considering a parametric model for , in which priors are imposed on the parameters governing this chosen shape. Also, the material density function can be sparse or dense. To help with this, we need to develop a prior structure on that adapts to the sparsity of the density in its native space. It is in this context, that such a problem of multiple inversions is well addressed in the Bayesian framework.

### 3.1 Defining a general voxel

Lastly, we realise that the data at hand is intrinsically discrete, owing to the imaging mechanism. Then,

• for a given beam energy, collected image data is characterised by a resolution that depends on the size of the electron beam that is used to take the image, as well as the details of the particular instrument that is employed for the measurements. The SEM cannot “see” variation in structure at lengths smaller than the beam size. In practice, resolution is typically less than the beam size; is given by the microscopist. Then only one image datum is available over an interval of width along each of the and -axes, i.e. only one datum is available from a square of edge , located at the beam pointing. This implies that using such data, no variation in the material density can be estimated within any square (lying on the plane) of side , located at a beam pointing, i.e. , where , . The reason for not being able to predict the density over lengths smaller than , in typical real-life material samples, is discussed below.

• image data are recorded at discrete (chosen) values of the beam energy such that for a given beam pointing, a beam at a given value of carries information about the material density from within the sub-surface depth of . Then 2 beams at successive values and of allow for estimation of the material density within the depth interval that is bounded by the beam penetration depths. Here , =0. This implies that for a given beam pointing, over any such interval, no variation in the material density can be learnt from the data, i.e. where , . Again, the reason for inability to predict the density at any is explained below.

As is evident in these 2 bulletted points above, we are attempting to learn the density at discrete points identified by the resolution in the data. The inability to predict the density at any point inside the sample could be possible in alternative models, for example when a Gaussian Process (GP) prior is used to model the trivariate density function that is typically marked by disjoint support in and/or by sharp density contrasts (seen in a real density as in Figure 10 and those learnt from simulated image data, as in Figure 5). While the covariance structure of such a GP would be non-stationary (as motivated in the introductory section), the highly discontinuous nature of real-life density functions would compel this non-stationary covariance function to not vary smoothly. Albeit difficult, the parametrisation of such covariance kernels can in principle be modelled using a training data–except for the unavailability of such training data (comprising known values of density at chosen points inside the given 3-D sample). As training data is not at hand, parametrisation of the non-stationary covariance structure is not possible, leading us to resort to a fully discretised model in which we learn the density function at points inside the sample, as determined by the data resolution. In fact, the learnt density function could then be used as training data in a GP-based modelling scheme to help learn the covariance structure. Such an attempt is retained for future consideration.

Thus, prediction would involve (Bayesian) spatial regularisation which demands modelling of the abruptly varying spatial correlation structure that underlies the highly discontinuous, trivariate density function of real life material samples (see Figure 10). The spatial variation in the correlation structure of the image data cannot be used as a proxy for that of the material density function, since the former is not a pointer to the latter, given that compression of the density information within a 3-D interaction-volume generates an image datum. Only with training data comprising known material density at chosen points, can we model the correlation structure of the density function. In lieu of training data, as in the current case, modelling of such correlation is likely to fail.

Hence we learn the density function at the grid points of a 3-D Cartesian grid set up in the space of , and . No variation in the material density inside the -th grid-cell can be learnt, where this grid-cell is defined as the “-th voxel” which is the cuboid given by

1. square cross-sectional area (on a constant plane) of size , with edges parallel to the and axes, located at the -th beam incidence,

2. depth along the -axis ranging from ,

with , , . Then one pair of opposite vertices of the -th voxel are respectively at points and , where the -th beam incidence is at the point and

 xi:=imodulo(√Ndata)yi:=int(i/√Ndata)+1.. (3.1)

See Figure 2 for a schematic depiction of a general voxel. Here is the number of intervals of width along the -axis, as well as along the -axis. Thus, on the part of the plane that is imaged, there are squares of size . Formally, the density function that we can learn using such discrete image data is represented as

 ξ(k)i:=ρ(x,y,z)forx∈[xi,xi+ω),y∈[yi,yi+ω),z∈[h(k−1),h(k)), (3.2)

This representation suggests that a discretised version of the sought density function is what we can learn using the available image data. We do not invoke any distribution for the unknown parameters
. It is to be noted that the number of parameters that we seek is deterministically driven by the image data that we work with–the discretisation of the data drives the discretisation of the space of , and and the density in each of the resulting voxels is sought. Thus, when the image has a resolution and there are such image data sets recorded at a value of , the number of sought density parameters is where is the number of squares of size that comprise part of the surface of the material slab that is imaged. Here the number of parameters is typically large; in the application to real data that we present later, this number is 2250. We set up a generic model that allows for the learning of such large, data-driven number of distribution-free density parameters in a Bayesian framework.

### 3.2 Defining a general interaction-volume

Under the consideration of a hemi-spherical shape of the interaction-volume, the -th interaction-volume is completely specified by pinning its centre to the -th beam pointing at on the =0 plane and fixing its radius to , (?) where the constant of proportionality is known from microscopy theory222 In the context of the application to microscopy, atomic theory studies undertaken by ? suggest that the maximal depth to which an electron beam can penetrate inside a given material sample is (measured in m), for a beam of electrons of energy (measured in kV), inside material of mass density of (measured in gm cm), atomic number and atomic weight (per gm per mole), is (3.3) Here is an integer valued atomic number of the material, while and are positive-definite real valued constants. As increases, the depth and radial extent of the interaction-volume increases.and and are defined in Equation 3.1. In this hemispherical geometry, the maximal depth that the -th interaction volume goes down to is .

The measured image datum in the -th pixel, created at , is . This results when the convolution , of the unknown density and kernel in the voxels that lie inside the -th interaction-volume, is sequentially projected (as mentioned in Section 3), onto the centre of the -th interaction-volume, i.e. onto the point . This projection is referred to as . The integral representation of such a projection suggests 3 integrals, each over the three spatial coordinates that define the -th interaction-volume, according to

 C(ρ∗η)(k)i=∫R0(k)R=0∫θmaxθ=0RdRdθ∫z=zmax(R,θ)z=0ρ(s)∗η(s)dz∫RmaxR=0∫θmaxθ=0RdRdθ (3.4)

for . Here the vector represents value of displacement from the point of incidence , to a general point , inside the interaction-volume, i.e.

 s:=(x−xi,y−yi,z)T = (Rcosθ,Rsinθ,z)T R=√(x−xi)2+(y−yi)2 tanθ=y−yix−xi.

### 3.3 Cases classed by image resolution

We realise that the thickness of the electron beam imposes an upper limit on the image resolution, i.e. on the smallest length over which structure can be learnt in the available image data. For example, the resolution is finer when the SEM image is taken in Back Scattered Electrons ( 0.01m) than in X-rays ( 1m). The comparison between the cross-sectional areas of a voxel and of an interaction-volume, on the =constant plane is determined by

1. since the area of the voxel at any is .

2. the atomic-number parameter of the material at hand (see Equation 3.3) since the radius of the hemi-spherical interaction-volume is a monotonically decreasing function of , at .

Then we can think of 3 different resolution-driven models such that

1. voxel cross-sectional area on the =0 plane exceeds that of interaction-volumes attained at all , i.e.
.

2. voxel cross-sectional area on the =0 plane exceeds that of interaction-volumes at some values of but is lower than that of interaction-volumes attained at higher values, i.e. and .

3. voxel cross-sectional area on the =0 plane is lower than that of interaction-volumes attained at all , i.e. .

The first two models are realised for coarse resolution of the imaging technique, i.e. for large . This is contrasted with the 3rd model, which pertains to fine resolution, i.e. low . Since is a monotonically decreasing function of (Equation 3.3), the 1st model is feasible for “high- materials”, while the 2nd model is of relevance when the material at hand is a “low- material”. To make this more formal, we introduce two classes of materials as follows. For fixed values of all parameters but ,

 ifω2≥π[R0(Neng)]2|Z, high-Z'' material (3.6) ifω2<π[R0(Neng)]2|Z,  low-Z'' material.

The computation of the sequential projections involve multiple integrals (see Equation 3.4). The required number of such integrals can be reduced by invoking symmetries in the material density in any interaction-volume. Such symmetries become avaliable, as the resolution in the imaging technique becomes coarser. For example, for the 1st model, when the resolution is the coarsest, the minimum length learnt in the data exceeds the diameter of the lateral cross-section of the largest interaction-volume achieved at the highest value of , i.e for . (Here “lateral cross section” implies the cross-section on the =0 plane). Then for the coarsest resolution, (see Figure 3) and so, which means the lateral cross-section of the -th interaction-volume is contained inside that of the -th voxel. Now in our discretised representation, the material density inside any voxel is a constant. Therefore it follows that when resolution is the coarsest, material density at any depth inside an interaction-volume, is a constant independent of the angular coordinate (see Equation 3.4). Thus, the projection of onto the centre of the interaction-volume, does not involve integration over this angular coordinate.

However, such will not be possible for the 3rd model at finer imaging resolution. When the resolution is the finest possible with an SEM, is the smallest. In this case, the lateral cross-section of many voxels fill up the cross-sectional area of a given interaction volume. Thus, the density is varying within the interaction-volume. Thus, the projection of cannot forgo integrating over .

The last case that we consider falls in between these extremes of resolution; this is the 2nd model. Isotropy is valid but not for ; (see middle panel in Figure 3).

### 3.4 Noise in discrete image data

As discussed above in Section 1, noise in the data can crucially influence how well-conditioned the inverse problem is. Image data is invariably subjected to noise, though for images taken with Scanning Electron Microscopes (SEM), the noise is small. Also, imaging with SEM being a controlled experiment, this noise can be reduced by the microscopist by increasing the length of time taken to scan the sample with the electron beam that is made incident on the sample in this imaging procedure. As illustrations of the noise in real SEM data, ? suggest that in a typical 20 second scan of the SEM, the signal to noise is about 20:1, which can be improved to 120:1 for a longer acquisition time of 10 minutes333In modern SEMs, noise reduction is alternatively handled using pixel averaging (”Stereoscan 430 Operator Manual”, Leica Cambridge Ltd. Cambridge, England, 1994).. Importantly, this noise could arise due to to variations in the beam configurations, detector irregularities, pressure fluctuations, etc. In summary, such noise is considered to be due to random variations in parameters that are intrinsically continuous, motivating a Gaussian noise distribution. Thus, the noise in the 2-D image data, created by the -th beam pointing for , (i.e. the noise in ) is modelled as a random normal variable, drawn from the normal with standard deviation 0.05, . In the illustration with real data, a scan speed of 50 s was used, which implies a noise of less than 5.

### 3.5 Identifying low-rank and spatially-varying components of density

It is known in the literature that the general, under-determined deconvolution problem is solvable only if the unknown density is intrinsically, “sufficiently” sparse (??). Here we advance a methodology, within the frame of a designed experiment, to learn the density - sparse or dense. In the following subsection, we will see that priors on the sparsity that we develop here, bring in relatively more information into the models when the density structure is sparse.

With this in mind, the density is recognised to be made of a constant (the low-rank component in the limiting sense), and the spatially varying component that may be sparse or dense in . We view the constant part of the density as where is the Dirac delta function on (?), centred at the centre of the -th interaction volume, . Then in our problem, the contribution of the constant part of the density to the projection onto the centre of the -th interaction-volume is

 C(ρ0∗η(x,y,z)(k)i) ≡ ρ0C(δ(x−xi,y−yi,z)∗η(x,y,z))(k)i) (3.7) = I0(k),

a constant independent of the beam pointing location , if is restricted to be a function of the depth coordinate only. As is discussed in Section 4, this is indeed what we adopt in the model for the kernel.

Then, depends only on the known morphological details of the interaction-volume for a given value of , . Thus, , where is the spatially-varying component of the image data. The identification of the constant component of the density is easily performed as due to the constant component of the measurable.

In our inversion exercise, it is the field that is actually implemented as data, after is subtracted from , for each . Hereafter, when we refer to the data, the spatially-varying part of the data will be implied; it is this part of the data that will hereafter be referred to as , at each value of , . Its inversion will yield a spatially varying sparse/dense density, that we will from now, be referred to as that in general lies in a non-convex subset of . Thus, we see that in this model, it is possible for to be 0. The construction of the full density, inclusive of the low-rank and spatially-varying parts, is straightforward once the latter is learnt.

### 3.6 Basics of algorithm leading to learning of unknowns

The basic scheme of the learning of the unknown functions is as follows. First, we perform multiple projections of the convolution of the unknown functions in the forward problem, onto the incidence point of the -th interaction-volume, . The likelihood is defined as a function of the Euclidean distance between the spatially-averaged projection , and the image datum . We choose to work with a Gaussian likelihood (Equation 7.1). Since the imaging at any value of along any of the viewing vectors is done independent of all other values of and all other viewing vectors, we assume to be conditionally on the values of the model parameters.

We develop adaptive priors on sparsity on the density and present strong to weak priors on the kernel, to cover the range of high to low information about the kernel that may be available. In concert with these, the likelihood leads to the posterior probability of the unknowns, given all the image data. The posterior is sampled from using adaptive Metropolis-Hastings within Gibbs. The unknown functions are updated using respective proposal densities. Upon convergence, we present the learnt parameters with 95-highest probability density credible regions. At the same time, discussions about the identifiability of the solutions for the 2 unknown functions and of uniqueness of the solutions are considered in detail.

## 4 Kernel function

Identifiability between the solutions for the unknown material density and kernel is aided by the availability of measurement of the kernel at the surface of the material sample and the model assumption that the kernel function depends only on the depth coordinate and is independent of and . Thus, the correction function is . In the discretised space of , and , the kernel is then defined as

 η(k):=η(z)forz∈[h(k−1),h(k)),k=1,…,Neng. (4.1)

Then is the value of the kernel function on the material surface, i.e. at and this is available from microscopy theory (?).

On occasions when the shape of the kernel function is known for the material at hand, only the parameters of this shape need to be learnt from the data. Given the measured value , i.e. the measured , the total number of kernel parameters is then 2. In this case, the total number of parameters that we attempt learning from the data is , i.e. the 2 kernel shape parameters and number of material density parameters. We refer to such a model for the kernel as parametric. For other materials, the shape of the kernel function may be unknown. In that case, a distribution-free model of the parameters of the vectorised kernel function are learnt. In this case, the total number of parameters that we learn is .

We fall back on elicitation from the literature on electron microscopy to obtain the priors on the kernel parameters. In microscopy practice, the measured image data collected along an angle to the vertical is given by , where and with a proportionality constant that is not known apriori but needs to be estimated using system-specific Monte Carlo simulations or approximations based on curve-fitting techniques; see ?. is the distribution of the variable and is again not known apriori but the suggestion in the microscopy literature has been that this can be estimated using Monte Carlo simulations of the system or from atomistic models. However, these simulations or model-based calculations are material-specific and their viability in inhomogeneous material samples is questionable. Given the lack of modularity in the modelling of the relevant system parameters within the conventional approach, and the existence in the microscopy literature of multiple models that are distinguished by the underlying approximations (??), it is meaningful to seek to learn the correction function.

Our construct differs from this formulation in that we construct an infinitesimally small volume element of depth inside the material, at the point . In the limit 0, the density of the material inside this infinitesimally small volume is a constant, namely . Thus, the image datum generated from within this infinitesimally small volume - via convolution of the material density and the kernel - is . Thus, over this infinitesimal volume, our formulation will tie in with the representation in microscopic theory, if we set . It then follows that is motivated to have the same shape as , where is a known, material dependent constant444 depends upon whether the considered image is in Back Scattered Electrons (BSE) or X-rays. In he former case represents the BSE coefficient, often estimated with Monte-Carlo simulations and in the latter, it is the linear attenuation coefficient).. When information about this shape is invoked, the model for is referred to as parametric; however, keeping general application contexts in mind, a less informative prior for the unknown kernel is also developed.

### 4.1 Parametric model for correction function

Information is sometimes available on the shape of the function that represents the kernel. For example, we could follow the aforementioned suggestion of approximating the form of as , multiplied by the scale-factor , as long as is known for the material and imaging technique at hand. In fact, for the values of beam energy that we work at, for most materials , so that 1 ( http://physics.nist.gov/PhysRefData/XrayMassCoef/tab3.html, Goldstein et. al 2003). Thus, we approximate the shape of to resemble that of , as given in microscopy literature. This shape is akin to that of the folded normal distribution (?), so that we invoke the folded normal density function to write

 η(z)≡Q[exp(−(z−η0)22s2)+exp(−(z+η0)22s2)]. (4.2)

Thus, in this model, is deterministically known if the parameters , and are, where, and are the mean and dispersion that define the above folded normal distribution. Out of these three parameters, only two are independent since is known on the surface, i.e. is known (see Section 4). Thus, by setting , we relate deterministically to and . In this model of the correction function, we put folded normal priors on and . Thus, this model of the correction function is parametric in the sense that is parametrised, and priors are put on its unknown parameters.

### 4.2 Distribution-free model for correction function

In this model for the correction function, we choose folded normal priors for , i.e. with . The folded normal priors on (?) underlie the fact that and that there is a non-zero probability for to be zero. Thus, folded normal and truncated normal priors for would be relevant but gamma priors would not be. Also, microscopy theory allows for the kernel at the surface to be known deterministically, i.e. is known (Section 4). Then setting 1, we relate and . This relation is used to compute , while uniform priors are assigned to the hyper-parameters , . We refer to this as the “distribution-free model of ”.

We illustrate the effects of both modelling strategies in simulation studies that are presented in Section 9.

The correction function is normalised by , where is the estimated unscaled correction function in the first -bin, i.e. for and is the measured value of the correction function at the surface of the material, given in microscopy literature. It is to be noted that the available knowledge of allows for the identifiability of the amplitudes of and .

## 5 Priors on sparsity of the unknown density

In this section we present the adaptive priors on the sparsity in the density, developed by examining the geometrical aspects of the problem.

As we are trying to develop priors on the sparsity, we begin by identifying the voxels in which density is zero, i.e. . This identification will be made possible by invoking the nature of the projection operator . For example, we realise that it is possible for the measured image datum collected from the -th interaction-volume to be non-zero, even when density in the -th voxel is zero, owing to contributions to from neighbouring voxels that are included within the -th interaction-volume. Such understanding is used to identify the voxels in which density is zero. For a voxel in which density is non-zero, we subsequently learn the value of the density. The constraints that lead to the identification of voxels with null density can then be introduced into the model via the prior structure in the following ways.

1. We could check if = , (where projection onto the centre of the -th interaction-volume without including density from the -th voxel, i.e. without including ). If so, then . However, this check would demand the computation of over a given interaction-volume, as many times as there are voxels that lie fully or partially inside this interaction-volume. Such computation is avoided owing to its being computationally intensive. Instead, we opt for a probabilistic suggestion for when density in a voxel is zero. Above, , .

2. We expect that the projection of onto the centre of the interaction-volume achieved at will in general exceed the projection onto the same central point of a smaller interaction volume (at ). However, if the density of the -th voxel is zero or very low, the contributions from the interaction-volume generated at may not exceed that from the same generated at . Thus, it might be true that

 C(ρ∗η)(k)i≤C(ρ∗η)(k−1)i⟹ξ(k)i=0 (5.1)

. This statement is true with some probability. An alternate representation of the statement 5.1 is achieved as follows. We define the random variable , with , such that

 τ(k)i := C(ρ∗η)(k)iC(ρ∗η)(k−1)i,ifC(ρ∗η)(k)i≤C(ρ∗η)(k−1)iandC(ρ∗η)(k−1)i≠0 (5.2) τ(k)i := 1otherwise

Then the statement 5.1 is the same as the statement: “it might be true that or is very low”, with some probability . In fact, as the projection from the bigger interaction-volume is in general in excess of that from a smaller interaction-volume, we understand that closer is to , higher is the probability that is close to 0. In other words, the smaller is , higher is the probability that is close to 0, where

 ν(τ(k)i)=pτ(k)i(1−p)1−τ(k)i, (5.3)

with the hyper-parameter controlling the non-linearity of response of the function to increase in . The advantage of the chosen form of is that it is monotonic and its response to increasing value of its argument is controlled by a single parameter, namely . We assign a a hyper prior that is uniform over the experimentally-determined range of [0.6, 0.99], to ensure that is flatter for lower and steeper for higher , (as moves across the range ). The prior on the density parameter is then defined as

 π0(ξ(k)i)=exp[−(ξ(k)iν(τ(k)i))2]. (5.4)

Thus, [0,1] . Any normalisation constant on the prior can be subsumed into the normalisation of the posterior of the unknowns given the data; had we used a normalisation of , for all , at those for which or , the prior on would have reduced to being a normal . This is because, for such , so that . However, for the prior on is non-normal as the dispersion is itself dependent on . This prior then adapts to the sparsity of the material density distribution. We do not aim to estimate the degree of sparsity in the material density in our work but aim to develop a prior so that sampled from this prior represents the material density distribution at the given , however sparse the vector is. Evidently, we do not use a mixture prior but opt for no mixing as in ?. Indeed, in our prior structure, the term could have been replaced by , (as in parametric Laplace priors suggested by ?, ?, ?), i.e. by since is non-negative, but as far as sparsity tracking is concerned–which is our focus in developing this prior here–the prior in Equaion 5.4 suffices.

That such a prior probability density sensitively adapts to the sparsity in the material density distribution, is brought about in the results of 2 simulation studies shown in Figure 4. In these studies, the density parameter values in the -th voxel are simulated from 2 simplistic toy models that differ from each other in the degree of sparsity of the true material density distribution: , and respectively, (where are uniformly distributed random numbers in ), at a chosen and energy indices . In the simulations we specify the beam penetration depth as suggested by ?; as any interaction-volume is hemispherical, its radius . The kernel parameters are generated from a quadratic function of with noise added. In the simulations, the material is imaged at resolution such that , i.e. the “1st model” is relevant (see Section 3.3). This allows for simplification of the computation of according to Equation 3.4. Then at this , for , are plotted in Figure 4 against , as is the logarithm of the prior computed according to Equation 5.4, with held as a random number, uniform in [0.6,0.99]. Logarithm of the priors are also plotted as a function of the material density parameter. We see from the figure that the prior developed here tracks the sparsity of the vector well.

## 6 Three models for image inversion

In this section we develop the projection of the convolution of the unknown density and kernel onto the centre of the -th interaction volume, for each of the 3 models discussed in Section 3.3, motivated by the 3 different classes of image data classified by their resolution. The difference between the size of a voxel and that of an interaction-volume determines the difficulty in the inversion of to the image data.

As explained in Section 3.3, we attempt to identify means of dimensionality reduction, i.e. reducing the number of integrals involved in the sequential projection of , (see Equation 3.4). We do this by identifying isotropy in the distribution of the material density within the interaction-volume when possible, leading to elimination of the requirement of averaging over the angular coordinate.

### 6.1 1st and 2nd-models - Low resolution

This class of image resolutions (), pertain to the case when the system is imaged by an SEM in X-rays.

#### 6.1.1 High-Z Systems

In the 1st model, for the high- materials, the cross-sectional area of an interaction-volume attained at any will fit wholly inside that of a voxel, where the cross-section is on a -constant plane. Then at a given depth, the density inside this interaction-volume is that inside the voxel, i.e. is a constant. Thus, dimensionality reduction is most easily achieved in this case with the density function isotropic inside an interaction-volume, bearing no dependence on the angular coordinate of Equation 3.4. Then when we revisit Equation 3.4 in its discretised form, the discrete convolution within the -th -bin and the -th beam pointing gives , so that the projection onto the centre of the -th interaction-volume is discretised to give in Equation 3.4, gives the following.

 C(ρ∗η)(k)i=1(R0(k))2k∑q=0[(R0(q))2−(R0(q−1))22{q∑t=0((h(t)−h(t−1))t∑m=1ξ(m)iη(t−m))}]. (6.1)

#### 6.1.2 Low-Z Systems

In the 2nd model, for low- materials, for any , and , the cross-sectional area of the -th interaction-volume on the =0 plane will spill out of the -th voxel into the neighbouring voxels. Then only for at , isotropy within the -th interaction volume holds good. In general, the projection onto the centre of the -th interaction-volume includes contributions from all those voxels that lie wholly as well as partly inside this interaction volume. This projection is then computed by first learning the weighted average of the contributions from all the relevant voxels and then distributing this learnt weighted average over the identified voxels in the proportion of the weights which in turn are in proportion of the voxel volume that lies within the -th interaction-volume. Thus, for the -th voxel that is a neighbour of the -th voxel that lies wholly inside the -th interaction-volume, let this proportion be , where .

In general, at a given , any bulk voxel has 8 neighbouring voxels and when the voxel lies at the corner or edge of the sample, number of nearest neighbours is less than 8. Then, at a given , there will be contribution from at most 9 voxels towards . At , for any , let the maximum number of nearest neighbours be so that . The notation for this number bears its dependence on both and . We define as the weighted average of the densities in the -th voxel and its nearest neighbours that are fully or partially included within the -th interaction-volume. Here , , . Thus,

 ¯ξ(k′)i:=imax|i,k′∑i′=1ξ(k′)i′|iw(k′)i′|i, (6.2)

where the -th neighbour of the -th voxel at the same depth, harbours the density and there is a maximum of such neighbours. The effect of this averaging over the nearest neighbours at this depth, is equivalent to averaging over the angular coordinate and results in the angular averaged density at this , which by definition, is isotropic, i.e. independent of the angular coordinate. Then the projection onto the centre of the -th interaction-volume, for , is computed as in Equation 6.1 with the material density term on the RHS of this equation replaced by the isotropic angular averaged density . However, for , the projection is computed as in Equation 6.1.

### 6.2 3rd model - Fine resolution

In certain imaging techniques, such as imaging in Back Scattered Electrons by an SEM or FESEM, the resolution , . In this case, at a given , material density from multiple voxels contribute to and the three integrals involved in the computation of this projection, as mentioned in Equation 3.4, cannot be avoided. Knowing the shape of the interaction-volume, it is possible to identify voxels that live partly or wholly inside the -th interaction-volume as well as compute the fractional or full volume of each such voxel inside the -th interaction volume.

For this model, the projection equation is written in terms of the coordinates of a point instead of the polar coordinate representation of this point, where the point in question lies inside the -th interaction-volume that is centred at . Then inside the -th interaction-volume, at a given and , . For

 x−xi∈[(u−1)ω,uω]u=−(int)(R0(k)ω)+1,−(int)(R0(k)ω)